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Translation Control Elements for High-level Protein 
Expression in the Plastids of Higher Plants and Methods 

of Use Thereof 



10 This application claims priority from United States 

Provisional Applications 60/095,163, filed August 3, 
1998, 60/112,257, filed December 15, 1998, 60/095,167 
filed August 3, 1998, 60/131,611, filed April 29, 1999 
and 60/138,764, filed June 11, 1999 under 35 U.S.C. 

15 §119 (e). The entire disclosures of each of the 
foregoing are incorporated by reference herein. 

Pursuant to 35 U.S.C. §202 (c) it is acknowledged 
that the U.S. Government has certain rights in the 
20 invention described herein, which was made in part with 
funds from the National Science Foundation, Grant Number 
MCB-96-30763 . 

FIELD OF THE INVENTION 

25 This invention relates to the fields of transgenic 

plants and molecular biology. More specifically, the 
invention provides vectors targeting the plastid genome 
which contain translation control elements facilitating 
high levels of protein expression in the plastids of 

30 higher plants. Both monocots and dicots are 

successfully transformed with the DNA constructs 
provided herein. 



BACKGROUND OF THE INVENTION 

35 Several publications are referenced in this 

application in order to more fully describe the state of 

-1- 
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the art to which this invention pertains. The 
disclosure of each of these publications is incorporated 
by reference herein. 

The chloroplasts of higher plants accumulate 
5 individual components of the photosynthetic machinery as 
a relatively large fraction of total cellular protein. 
The best example is the enzyme ribulose-1, 5-bisphosphate 
carboxylase-oxygenase (Rubisco) involved in C0 2 fixation 
which can make up 65% of the total leaf protein (Ellis, 

10 R.J. 1979) . Because of the potentially attainable high 
protein levels, there is significant interest in 
exploring chloroplasts as an alternative system for 
protein expression. To date, protein levels expressed 
from transgenes in chloroplasts are below the levels of 

15 highly- expressed chloroplast genes. Highest levels 
reported thus far in leaves are as follows: 1% of 
neomycin phophotransf erase (Carrer et al . , 1993); 2.5% 
(^-glucuronidase (Staub and Maliga, 1993) and 3-5% of 
Bacillus thuringiensis (Bt) crystal toxins (McBride et 

20 al., 1995) . An alternative system, based on a 

nuclear -encoded, plastid-targeted T7 RNA polymerase may 
offer higher levels of protein expression (McBride t 
al., 1994), although this yield may come at a price. 
In bacteria, the rate limiting step of protein 

25 synthesis is usually the initiation of translation, 
involving the binding of the initiator tRNA 
(formyl-methionyl-tRNA f ) and mRNA to the 7 0S ribosome, 
recognition of the initiator codon, and the precise 
phasing of the reading frame of the mRNA. Translation 

30 . initiation depends on three initiation factors (IF1, 

IF2, IF3) and requires GTP . The 3 OS subunit is guided to 
the initiation codon by RNA -RNA base pairing between the 
3' of the 16S rRNA and the mRNA ribosome binding site, 
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or Shine-Dalgarno (SD) sequence, located about 10 
nucleotides upstream of the translation initiation codon 
(Voorma, 1996} . RNA-RNA interaction between the 
"downstream box" (DB) , a 15 nt sequence downstream of 
5 the AUG translational initiation codon and complementary 
sequences in the 16S rRNA 3' sequence or anti -downstream 
box (ADB; nucleotide positions 1469-1483) may also 
facilitate loading of the mRNA onto the 30S ribosome 
subunit (Sprengart et al., 1996). In addition, specific 

10 protein-RNA interactions may also facilitate translation 
initiation (Voorma, 1996). 

Key components of the prokaryotic translation 
machinery have been identified in plastids, including 
homologues of the bacterial IF1, IF2 and IF3 initiation 

15 factors and an Sl-like ribosomal protein (Stern et al., 
1997) . Most plastid mRNAs (92%) contain a ribosome 
binding site or SD sequence: GGAGG, or its truncated 
tri- or tetranucleotide variant. This sequence is 
similar to the bacterial SD consensus 5 1 -UAAGGAGGUGA-3 1 

2 0 (Voorma, 1996) . High level expression of foreign genes 
of interest in the plastids of higher plants is 
extremely desirable. The present invention provides 
novel genetic translational control elements for use in 
plastid transformation vectors. Incorporation of these 

25 elements into such vectors results in protein expression 
levels comparable to those observed for highly expressed 
chloroplast genes in both monocots and dicots. 

SUMMARY OF THE INVENTION 

30 5' genetic regulatory regions contain promoters 

with distinct DNA sequence information which facilitates 
recognition by the RNA polymerase and translational 
control elements which facilitate translation. Both of 
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these components act together to drive gene expression. 

In accordance with the present invention, chimeric 
5 1 regulatory regions have been constructed which 
incorporate translation control elements. Incorporation 
of these chimeric 5 1 regulatory regions into plastid 
transforming vectors followed by transformation of 
target plant cells gives rise to dramatically enhanced 
levels of protein expression. These chimeric 5' 
regulatory regions may be used to advantage to express 
foreign genes of interest in a wide range of plant 
tissues. It is an object of the present invention to 
provide DNA constructs and methods for stably 
transforming plastids of multicellular plants containing 
such promoters. 

In one embodiment of the invention recombinant DNA 
constructs for expressing at least one heterologous 
protein in the plastids of higher plants are provided. 
The constructs comprise a 5 1 regulatory region which 
includes a promoter element, a leader sequence and a 
downstream box element operably linked to a coding 
region of said at least one heterologous protein. The 
chimeric regulatory region acts to enhance translational 
efficiency of an mRNA molecule encoded by said DNA 
construct. Vectors comprising the DNA constructs are 
also contemplated in the present invention. Exemplary 
DNA constructs of the invention include the following 
chimeric regulatory regions: PmnLatpB+DBwt , PrrnLatpB- 
DB, PrrnLatpB+DBm, PrrnLclpP+DBwt , PrrnclpP-DB, 
PrrnLrbcL+DBwt , PrrnLrbcL-DB , PrrnLrbcL+DBm, 
PrrnLpsbB+DBwt , PrrnLpsbB-DB, PrrnLpsbA+DBwt , PrrnLpsbA- 
DB, PrrnLpsbA-DB (+GC) , PrrnLT7glO+DB/Ec , 
PrrnLT7glO+DB/pt , and PrrnLT7glO-DB. Downstream box 
sequences preferred for use in the constructs of the 
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invention have the following sequences: 

5 ■ TCCAGTCACTAGCCCTGCCTTCGGCA ' 3 and 5 1 CCCAGTCATGAATCACA 
AAGTGGTAA 1 3 . 

The 5' regulatory segments of the invention have 
been successfully employed to drive the expression of 
the bar gene from 5. hydroscopicus in the plastids of 
higher plants. Synthetic bar genes have also been 
generated and expressed using the DNA constructs of the 
present invention. These constructs have been 
engineered to maximize transgene containment in plastids 
by incorporating rare codons into the coding region that 
are not preferred for protein translation in 
microorganisms and fungi. 

In yet another embodiment of the invention, at 
least one fusion protein is produced utilizing the DNA 
constructs of the invention. An exemplary fusion 
protein has a first and second coding region operably 
linked to the 5' regulatory regions described herein 
such that production of said fusion protein is regulated 
by said 5' regulatory region. In one embodiment the 
first coding region encodes a selectable marker gene and 
the second coding region encodes a fluorescent molecule 
to facilitate visualization of transformed plant cells. 
Vectors comprising a DNA construct encoding such a 
fusion protein are also within the scope of the present 
invention. An exemplary fusion protein consists an aadA 
coding region operably linked to a green fluorescent 
protein coding region. These moieties may be linked by 
peptide linkers such as ELVEGKLELVEGLKVA and 
ELAVEGKLEVA. 

Plasmids for transforming the plastids of higher 
plants, are also included in the present invention. 
Exemplary plasmids are selected from the group 
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consisting of pHK30(B), pHK31 (B) , pHK60, pHK32 (B) , 
pHK33(B), pHK34 (A) , pHK35 (A) , pHK64 (A) , pHK36 (A) , 
pHK37 (A) , pHK38 (A) , pHK39 (A) , pHK40 (A) , pHK41 (A) , 
pHK42 (A) , pHK43(A), pMSK56, pMSK57, pMSK48, pMSK49, 
5 pMSK35, pMSK53 and pMSK54 . 

Transgenic plants, both monocots and dicots 
harboring the plasmids set forth above are also - 
contemplated to be within the scope of the invention . 

In yet another embodiment of the invention, methods 

10 are provided for producing transplastomic monocots. One 
method comprises a) obtaining embryogenic cells; 
b) exposing said cells to a heterologous DNA molecule 
under conditions whereby said DNA enters the plastids of 
said cells, said heterologous DNA molecule encoding at 

15 least one exogenous protein, said at least one exogenous 
protein encoding a selectable marker; c) applying a 
selection agent to said cells to facilitate sorting of 
untransf ormed plastids from transformed plastids, said 
cells containing transformed plastids surviving and 

2 0 dividing in the presence of said selection agent; d) 

transferring said surviving cells to selective media to 
promote plant regeneration and shoot growth; and e) 
rooting said shoots, thereby producing transplastomic 
monocot plants. The heterologous DNA molecule may be 

2 5 introduced into the plant cell via a process selected 

from the group consisting of biolistic bombardment, 
Agrobacterium-mediated transformation, microinjection 
and electroporation . In one embodiment of the above 
described method, protoplasts are obtained from the 

3 0 embryogenic cells and the heterologous DNA molecule is 

delivered to said protoplasts by exposure to 
polyethylene glycol. Suitable selection agents for the 
practice of the methods of the invention are 
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streptomycin, and paromomycin. Monocot plants which may 
be transformed using the methods of the invention 
include but are not limited to maize, millet, -sorghum, 
sugar cane, rice, wheat, barley, oat, rye, and turf 
5 grass. 

In a preferred embodiment a method for producing 
transplastomic rice plants is provided. This method 
entails the following steps: a) obtaining embryogenic 
calli; b) inducing proliferation of calli on modified 
10 CIM medium; c) obtaining embryogenic cell suspensions 
of said proliferating calli in liquid AA medium; 

d) bombarding said embryogenic cells with 
microprojectiles coated with plasmid DNA; 

e) tranferring said bombarded cells to selective liquid 
15 AA medium; f ) transferring said, cells surviving in AA 

medium to selective RRM regeneration medium for a time 
period sufficient for green shoots to appear; and 
g) rooting said shoots in a selective MS salt medium. 
Plasmids suitable for transforming rice as set 

20 forth above include pMSK35 and pMSK53, pMSK54 and 

pMSK49. Transplastomic rice plants so produced are also 
contemplated to be within the scope of the invention. 

In yet a final embodiment of the invention methods 
for containing transgenes in transformed plants are 

25 provided. An emplary method includes the following 

steps: a) determining the codon usage in said plant to 
be transformed and in microbes found in association with 
said plant; and b) genetically engineering said 
transgene sequence via the introduction of rare 

30 microbial codons to abrogate expression of said 

transgene in said plant associated microbe. In an 
exemplary embodiment of the method described immediately 
above the transgene is a bar gene and said rare codons 
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are arginine encoding codons selected from the group 
consisting of AGA and AGG, and transgene is not 
expressed in E.coli. 

5 The following definitions will facilitate the 

understanding of the subject matter of the present 
invention : 

Heteroplastomic : refers to the presence of a mixed 
population of different plastid genomes within a single 

10 plastid or in a population cf plastids contained in 
plant cells or tissues. 

Homoplastomic ; refers to a pure population of 
plastid genomes, either within a plastid or within a 
population contained in plant cells and tissues. 

15 Homoplastomic plastids, cells or tissues are genetically 
stable because they contain only one type of plastid 
genome. Hence, they remain homoplastomic even after the 
selection pressure has been removed, and selfed progeny 
are also homoplastomic. For purposes of the present 

20 invention, heteroplastomic populations of genomes that 

are functionally homoplastomic (i.e., contain only minor 
populations of wild-type DNA or transformed genomes with 
sequence variations) may be referred to herein as 
"functionally homoplastomic" or "substantially 

25 homoplastomic." These types of cells or tissues can be 
readily purified to a homoplastomic state by continued 
selection. 

Plastome : the genome of a plastid. 
Transplastome : a transformed plastid genome. 

30 Transformation of plastids : stable integration of 

transforming DNA into the plastid genome that is 
transmitted to the seed progeny of plants containing the 
transformed plastids. 
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Selectable marker gene : the term "selectable 
marker gene" refers to a gene that upon expression 
confers a selective advantage to the plastids. and a 
phenotype by which successfully transformed plastids or 
cells or tissues carrying the transformed plastid can be 
identified. 

Transforming DNA ; refers to homologous DNA, or 
heterologous DNA flanked by homologous DNA , which when 
introduced into plastids becomes part of the plastid 
genome by homologous recombination. 

Qperably linked ; refers to two different regions 
or two separate genes spliced together in a construct 
such that both regions will function to promote gene 
expression and/or protein translation. 

The detailed description as follows provides 
examples of preferred methods for making and using the 
DNA constructs of the present invention and for 
practicing the methods of the invention. Any molecular 
cloning and recombinant DNA techniques not specifically 
described are carried out by standard methods, as 
generally set forth, for example in Sambrook.et al . , 
"DNA Cloning, A Laboratory Manual," Cold Spring Harbor 
Laboratory, 1989. 

BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1A. Plastid mRNAs and the small (16S) 
ribosomal RNA contain complementary sequences downstream 
of AUG implicating interactions between mRNA and 16S 
rRNA during translation initiation in plastids. Proposed 
model is based on data in E. coli (Sprengart et al., 
1996)/ for sequence of 16S rRNA see ref. (Shinozaki et 
al., 1986b). SD, Shine -Dalgarno sequence; ASD, anti SD 
region; DB, downstream box; ADB, anti DB region. Wat son - 
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Crick (line) and G-U (closed circle) pairing are marked. 

Figure IB. Sequence of the ant i- downstream -box 
regions (ADB sequence underlined) of the 16S rRNA in 
plastids (pt; this application) and in E. coli (Ec; 
5 Sprengart et al., 1996). The E. coli ADB box contains 

sequences between nucleotides 1469-1483 of the 16S rRNA 
(Sprengart et al . , 1996), corresponding to nucleotides 
1416-1430 of the tobacco 16S rRNA (Dams et al . , 1988; 
sequence between nucleotides. 104173-104187 in Shinozaki 
10 et al . , 1986) . 

Figure 2A. Base-pairing between plastid ADB and 
atp£, clpP, rbcL, psbB and psbA mRNAs (underlined) . 
Multiple alternative DB-ADB interactions are shown. 

15 Nucleotides changed to reduce or alter mRNA-rRNA 

interaction are in lower case. The number of potential 
nucleotide pairs formed with the 26 nt ADB region is in 
parenthesis. The number of pairing events affected by 
mutagenesis is in bold. 

20 Figure 2B. Complementarity of Prrn T7 phage gene 10 

leader derivatives with the E. coli and plastid ADB 
sequences. Nucleotides changed to reduce or alter mRNA- 
rRNA interaction are in lower case. The number of 
potential nucleotide pairs formed with the 26 nt ADB 

25 region is in parenthesis. 

Figure 3A. DNA sequence of the chimeric Prrn 
plastid promoter fragments with atpB and clpP 
translation control regions. The plasmid name that is 
30 the source of the promoter fragment is given in 

parenthesis. The Prrn promoter sequence is underlined; 
nucleotide at which transcription initiates in tobacco 
plastids is marked with filled circle; translational 
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initiation codon (AUG) is in bold; SD is underlined with 
a wavy line; nucleotides of the 5' and 3' restriction 
sites and point mutations are in lower case. 

Figure 3B. DNA sequence of the chimeric Prrn 
5 plastid promoter fragments with rJbcL and psbB 

translation control regions. For details see description 
of Fig. 3A. 

Figure 3C. DNA sequence of the chimeric Prrn 
plastid promoter fragments with psbA translation control 
10 regions. For details see description of Fig. 3A. 

Figure 3D. DNA sequence of the chimeric Prrn 
plastid promoter fragments with the T7 phage gene 10 
(PrrnLT7glO+DB/Ec) plastid (PrrnLT7glO+DB/pt ) and 
synthetic DB (PrrnLT7glO-DB) . For details see 
15 description of Fig. 3A. 

Figure 4A. Plastid transformation vector pPRVlllA 
with chimeric neo genes. Plasmid serial numbers, for 
example pHK34, designate pFRVlllA plastid transformation 

20 vectors derivatives; adjacent plasmid numbers in 

parenthesis (e.g. pHK14) designate the source of the 
chimeric neo gene in pUC118 or pBSIIKS* vectors. Arrows 
mark orientation of the selectable marker gene (aadA) 
and of the chimeric neo gene. Plastid targeting 

25 sequences are underlined in bold. Components of the 
chimeric neo genes are: Prrn, rRNA operon promoter 
fragment; L, leader sequence; DB, downstream box; Nhel 
site which serves as a synthetic DB is marked by a heavy 
line; neo, neomycin phosphotransferase .coding region; 

30 TrbcL, rbcL 3 » -untranslated region. 16SrDNA, tmV, 

rpsl2/7 are plastid genes (Shinozaki et al . , 1986). The 
restriction sites marked for: EcoRI, SphI, StuI, Sad, 
Nhel, Ncol, Xbal, Hindlll, BamHI and Bglll. Restriction 
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sites in brackets were eliminated during construction. 
The neo translation initiation in plasmid pHK36 is 
included in Ncol site (not marked) . The presence and 
relative order of Nhel (**) and Ncol {*) restriction 
sites in the plasmid pPRVlllA -DB derivatives (pHK35, 
pHK37 # pHK40, pHK42, pHK43) are marked by asterisks. The 
promoter sequences are shown in Figures 3B, C and D. 

Figure 4B. Plastid transformation vector pPRVlllB 
with chimeric neo genes. See description of Fig. 4A. The 
promoter sequences are shown in Fig. 3A. 

Figure 5. Construction of Prrn promoter-plastid 
leader fragments by overlap extension PCR. 

Figure 6. Construction by the PCR of 
PrrnLT7glQ-f DB/Ec promoter (Sad -Nhel fragment) in 
plasmid pHK18, 

Figure 7. Construction by PCR of the 
PrrnLT7glO+DB/pt promoter (Sac I -Nhel fragment) in 
plasmid pHK19 . 

Figure 8. Restriction map of plasmids pHK2 and pHK3 
with the Prrn (L) rbcL (S) : :neo: :TrbcL gene. Restriction 
enzyme cleavage sites are marked for: BamHI, EcoRI , 
Hindu I; Ncol, Nhel, SacI, Xbal . 

Figure 9 . DNA sequence of the 
Prrn(L)rbcMS) : :neo: :TrbcL gene in plasmid pHK3 . Plasmid 
pHK2 carries an identical neo gene, except that there is 
an EcoRI site upstream of the SacI site. 

Figure 10. NPTII accumulation in tobacco leaves 
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detected by protein gel blot analysis. Amount of total 
soluble leaf protein {fig) loaded on SDS-PAGE gel is 
indicated above the lanes. Lanes are designated with 
plasmid used for plant transformation; fig protein loaded 
per lane is given below. NPTII standard and Nt-pTNH32 
extracts were - run as positive controls; extracts from 
wild-type non- transformed plants (wt) were used as 
negative controls. 

Figure 11. The levels of neo mRNA in the 
transplastomic leaves. The blots were probed for neo 
(top) and cytoplasmic 25S rRNA as loading control 
(bottom) . Positions of the monocistronic neo mRNA in 
vector pPRVlllA (Figure 4A) , the monocistronic neo and 
dicistronic neo-aadA transcripts in vector pPRVlllB 
(Figure 4B) and the monocistronic neo and dicistronic 
rbcL-neo transcripts in pTNH32 transformed plants 
(Carrer et al . , 1993) are marked. Lanes are designated 
with the transgenic plant serial number. 4 pig total 
cellular RNA was loaded per lane. 

Figure 12. Fraction of a codon encoding a 
particular amino acid and triplet frequency per 1000 
codons in the mutagenized atpB and rbcL DB region. 
Altered nucleotides are in lower case. 

Figure 13A. NPTII accumulation in tobacco roots 
detected by protein gel blot analysis. Lanes are 
designated with the plasmid used for plant 
transformation; (ig protein loaded per lane is given 
below. NPTII standard was run as positive control; 
extracts from wild-type non -trans formed plants (wt) were 
used as negative controls. 
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Figure 13B. Steady-state levels of neo mRNA in 
tobacco roots. The neo probe detects a monocistronic 
mRNA in plants transformed with vector pPRVlllA (Figure 
4A) , and a monocistronic neo and a dicistronic neo-aadA 
5 transcript in plants transformed with vector pPRVlllB 
(Figure 4B) . Lanes are designated with the transgenic 
plant serial number. 4 /-eg total cellular RNA was loaded 
per lane. 

Figure 14. Protein gel blot analysis to detect 
NPTII accumulation in tobacco seeds. Lanes are 
designated with plasmid used for plant transformation; 
fig protein loaded per lane is given below. NPTII 
standard was run as positive control; extracts from 
wild- type non- trans formed plants (wt) were used as 
negative controls. 

Figure 15A. Diagram showing integration of the 
chimeric neo and aadA genes into the plastid genome by 
20 two homologous recombination events via the plastid 
targeting sequences (underlined) . On top is shown a 
diagram of plasmids pHK30 and pHK32 are plastid 
transformation vector pPRVlllB derivatives (Zoubenko et 
al., 1994). Horizontal arrows mark gene orientation. For 
25 description of chimeric neo genes, see Figure 4B. 

1 6SrDNA , trnV, rpsl2/7 are plastid genes (Shinozaki et 
al., 1986). The restriction sites marked for: EcoRI (E) , 
SacI (S), Nhel (N) , Xbal (X), Hindi! I (H) , BamHI (Ba) 
and Bglll Restriction sites in brackets were eliminated 
30 during construction. In the middle the wild- type plastid 
DNA region (Wt-ptDNA) targeted for insertion is shown. 
Lines connecting plasmids and ptDNA mark sites of 
homologous recombination at the end of the vector 
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plastid- targeting regions. The transformed plastid 
genome segment (T-ptDNA) map is shown on the bottom. 

Figure 15B. DNA gel blot analysis confirms of 
integration of the neo and aadA genes into the plastid 
5 genome. The blot on top was probed with the plastid 

targeting sequence (Probe 1 in Figure ISA) . It lights up 
4.2-kb and 1.4-kb fragments, in transplastomic lines, and 
a 3.1-kb fragment in wild- type (see Figure ISA). Note 
that the 1.4-kb signal is week in most clones. The blot 
10 on the bottom was probed for neo sequences, which are 
present only in the transplastomic lines. 

Figure 16A. Diagram showing integration of the bar 
gene into the tobacco plastid genome. Map of the plastid 

15 targeting region in plasmid pJEK6 is shown on top. The 
targeted region of the wild-type plastid genome (wt- 
ptDNA) is shown in the middle. Integrated transgenes in 
the transplastome (T-ptDNA) are shown at the bottom. Map 
positions are shown for: the bar gene; aadA, the 

2 0 selectable spectinomycin resistance gene; 16SrDNA and 

rpsl2/7, plastid genes (Shinozaki et al . , 1986). Arrows 
indicate direction of transcription. Map position of the 
probe (2.5 kb) is marked by a heavy line; the wild-type 
(2.9-kb) and transgenic (3.3-kb, 1.9-kb) fragments 

25 generated by Smal and Bglll digestion are marked by thin 
lines. 

Figure 16B. DNA gel blot confirms integration of 
bar into the tobacco plastid genome. Data are shown for 
transplastomic lines Nt-pJEK6-2A through E, Nt-pJEK6-5A 
30 through E and Nt-pJEK6-13A and B, and the wild-type 

parental line. Smal-Bglll digested total cellular DNA 
was probed with the 2.5-kb Apal-BamHI plastid targeting 
sequence marked with heavy line in Figure 16A. 
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Figure 17. PAT assay confirms bar expression in 
tobacco plastids. PAT activity was determined by 
conversion of PPT into acetyl-PPT using radiolabeled 14 C 
Acetyl-CoA. Data are shown for transplastomic lines Nt- 
PJEK6-2D, Nt-pJEK6-5A and Nt -pJEK6 -13B, nuclear 
transformant Nt-pDM3 07-10 and wild- type (wt) • 

Figure 18A. Transplastomic tobacco plants are 
herbicide resistant. Wild- type and pJEK6- transformed 
plants 13 days after Liberty spraying (5 ml, 2% 
solution) . 

Figure 18B. Maternal inheritance of PPT resistance 
in the seed progeny. Seeds from reciprocal crosses with 
Nt-pJEK6-5A plants germinated on 0 , 10 and 50 mg/L PPT. 
wt x pJEK6-5A, transplastomic used as pollen parent; 
pJEK6-5A x wt , transplastomic line female parent. 
Resistant seedlings are green on PPT medium, sensitive 
seedlings are bleached. 

Figure 19 ♦ The engineered bacterial bar coding 
.region DNA sequence in plasmid pJEK3 and pJEK6 and 
encoded amino acid sequence. Nucleotides encoding the 
rbcL five N-terminal amino acids are in lower case. 
Nucleotides added at the 3 1 end during construction are 
also in lower case. Ncol , Bglll and Xbal cloning sites 
are marked. 

Figure 20A. The synthetic bar gene DNA sequence and 
the encoded amino acid sequence. The arginines encoded 
by AGA/AGG codons are in bold. Original nucleotides are 
;in capital letters, altered bases are in lower case. 
Restriction sites used for cloning are marked. 

Figure 20B. The synthetic s2-bar gene DNA sequence 
and the encoded amino acid sequence. The arginines 
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encoded by AGA/AGG codons are in bold. Original 
nucleotides are in capital letters, altered bases are in 
lower case. Restriction sites used for cloning are 
marked . 

5 

Figure 21. Synthetic and bacterial bar genes. The 
bar coding region is expressed in the Prrn/TrbcL 
cassettes. Note that the Prrn promoters differ with 
respect to the translational control region. 

10 

Figure 22A. PAT is expressed in £. coli from bar, 
but not from s-Jbar coding region. PAT activity was 
determined by conversion of PPT into acetyl -PPT using 
radiolabeled 14 C-Acetyl-CoA. Data are shown for £. coli 

15 transformed with plasmids pJEK6 and pK012 carrying the 
bar gene, and pK08, carrying 8-bar. 

Figure 22B. PAT assay confirms expression of bar 
and 3-bar in tobacco plastids. PAT activity was 
determined by conversion of PPT into acetyl -PPT using 

20 radiolabeled 14 C-Acetyl-CoA. Data are shown for 

transplastomic lines Nt-pJEK6-13B and Nt-pK03-24a,B 
carrying bar and s-Jbar, respectively. 

Figure 23A. Plastid transformation vector with 
25 FLARE16-S as selectable marker targeting the plastid 

inverted repeat region. DNA and protein sequence at the 
aadA-gfp junction. Nucleotides derived from aadA and gfp 
are in capital, adapters sequences and the point 
mutation used to create the BstXI restriction site 
3 0 (bold) are in lower case. 

Figure 23B. Physical map of plastid transformation 
vector with FLARE16-S as selectable marker targeting the 
plastid inverted repeat region. Shown are: the promoter 
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(P) and 3'UTR (T) of the aadA16pt-gfp coding region and 
its component parts (aadA and gfp coding regions) ; rrnl6 
and rpsl2/7 plastid genes; restriction endonuclease 
sites Hindi II (removed), Spel, Xbal, Ncol, BstXI, Nhel, 
EcoRI. In plasmid pMSK56 aadA16pt-gfp is expressed from 
the PrrmLatpBDB promoter and encodes FLARE16-S1. In 
plasmid pMSK57 aadA16pt-gfp is expressed from the 
Prrn:LrbcLDB promoter and encodes FLARE16-S2. 

Figure 24. Localization of FLARE16-S to tobacco 
plastids by laser scanning confocal microscopy in 
heteroplastomic tissue. Images were processed to detect 
FLARE16-S (green) and chlorophyll fluorescence (red) and 
both in a merged view. Sections are shown from plants 
expressing FLARE16-S1 (a,b) and FLARE16-S2 (3c-f ) . Note 
wild-type and transformed plastids in leaves (3a,c,d), 
chromoplasts of petals (3b), trichomes (3e) and non- 
green root plastids (f ) . White arrows mark 
transplastomic organelles. Bars represent 25 //m. 

Figure 25. Immunoblot analysis of FLARE16-S 
accumulation in chloroplasts . The amount of loaded 
protein (/ig) is indicated above the lanes. 
Quantification of FLARE16-S1 (Nt-pMSK56 plants) and 
FLARE16-S2 (Nt-pMSK57 plants) is based on comparison 
with a purified GFP dilution series. Extract from a 
wild- type plant (Nt) was used as negative control. 

Figure 2 6A. Amplification of border fragments 
confirms integration of FLARE -S genes into the plastid 
genome. Maps of the plastid targeting regions of the 
rice (pMSK4 9) and tobacco (pMSK57) vectors, the segment 
of the rice and tobacco plastid genomes targeted by the 
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vectors (Os-wt and Nt-wt) , and the same regions after 
integration of FLARE -S genes. The ends of plastid 
targeting regions are connected with cognate sequences 
in the wild-type plastid genome. Plastid genes 1 GSrDNA , 
5 trnV and rpsl2/7 are marked only in the wild- type 

plastid genomes. The position of PCR primers (01-06) and 
the PCR fragments generated by them are also shown. 

Figure 26B. Amplification of border fragments 
confirms integration of FLARE -S genes into the plastid 

10 genome. Gels with PCR-amplif ied left and right border 

fragments, and with aadA fragment. Results are shown for 
rice (Os-pMSK49-l and Os-pMSK49-2) and tobacco (Nt- 
pMSK57) transplastomic lines and wild- type (Os-wt) rice. 
The molecular weight markers is EcoRI- and Hindlll- 

15 digested 7. DNA. 



Figure 27. Localization of FLARE11-S3 to rice 
chloroplasts in the Os-pMSK49-5 line by laser scanning 
confocal microscopy. Images were processed to detect 
20 FLARE 1 1 - S (green) and chlorophyll fluorescence (red) and 
both in a merged view. Arrows point to mixed populations 
of plastids in cells. Bar represents 25 /zm. 

Figure 28. The sequence of FLARE16-S is shown. 

25 

Figure 29. The sequence of FLARE16-S1 is shown. 

Figure 30. The sequence of FLARE16-S2 is shown. 



30 



Figure 31. The sequence of FLARE11-S is shown. 



Figure 32. The sequence of FLARE11-S3 is shown. 
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Figures 33A and 33B. The sequence of pMSK35 is 
shown. 

Figures 34 A and 34B. The sequence of pMSK49 is 
5 shown . 

Figure 35. A table describing the FLARE constructs 
of the invention. 

10 DETAILED DESCRIPTION OF THE INVENTION 

DNA cassettes for high level protein expression in 
plastids are provided herein. Higher plant plastid 
mRNAs contain sequences within 50 nt downstream of AUG 
that are complementary to the 16S rRNA 3 -region. These 

15 complementary sequences are approximately at the same 

position as DB sequences in E. coli mRNAs. See Figures 
1A and 2A. Interestingly, the tentative plastid DB 
sequence significantly deviates from the E. coli DB 
consensus, since the tobacco plastid and E. coli 1SS 

20 rRNA sequence in the ant i -downstream-box (ADB) region is 
significantly different (Figure IB) . The feasibility 
of improving protein expression by incorporating DB 
sequences in plastids was assessed by constructing a 
series of chimeric 5' regulatory regions consisting of 

25 the plastid rRNA operon o 70 -type promoter (Prrn-114; Svab 
and Maliga, 1993; Vera and Sugiura, 1995) and the leader 
sequence of plastid mRNAs with the native DB, 
mutagenized DB and synthetic DB sequences. The plastid 
mRNA leaders differ with respect to the presence and 

30 position of the SD sequence. Translation efficiency 

from the chimeric promoters was determined by expressing 
the bacterial neo gene in plastids. The neo (or Jean) 
gene encodes neomycin phosphotransferase (NPTII) and 
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confers resistance to kanamycin in bacteria and plastids 
(Carrer et al . , 1993). We have found that NFTII from the 
chimeric neo transcripts accumulates in the range of 
0.2% to 23% of the total soluble leaf protein, 
5 indicating the importance of translational control 

signals in the mRNA 5' region for high-level protein 
expression. 

There is great interest in producing recombinant 
proteins in plants plastids which, thus far have been 

10 expressed from nuclear genes only (Arntzen, 1997/ Conrad 
and Fiedler, 1998; Kusnadi et al., 1997). Protein 
levels produced from the PrrnLrbcL+DBwt and PrmLT7glO 
expression cassettes described here significantly exceed 
protein levels reported for nuclear genes. Accumulation 

15 of NPTII from nuclear genes is typically <<0.1% (Allen 

et al., 1996), the highest value being 0.4% of the total 
soluble protein (Houdt et al., 1997). We reported 
earlier accumulation of 1% NPTII from a plastid neo 
transgene (Carrer et al., 1993). Other examples for 

20 protein accumulation from plastid transgenes are 2.5% (3- 
glucuronidase (GUS) (Staub and Maliga, 1993)) and 3-5% 
of the Bacillus thuringiensis (Bt) crystal toxins 
(McBride et al., 1995). As compared to this earlier 
report, we have achieved a significant increase in NPTII 

25 levels, up to 23% of total soluble protein. 

FLARE- S , a protein obtained by fusing an 
antibiotic- inactivating enzyme with the Aequorea 
victoria green fluorescence protein accumulated to 8% 
and 18% of total soluble protein from the PrrnLatpB+DBwt 

30 and PrrnLrbcL+DBwt cassettes provided herein. See 
Example 8. High-level protein accumulation from the 
cassettes of the present invention can be clearly 
attributed to engineering the translational control 
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region (TCR) of the chimeric genes. These novel genetic 
elements may be used in different applications to drive 
expression of proteins with agronomic, industrial or 
pharmaceutical importance . 

There is a strong demand for methods that control 
the flow of transgenes in field crops. Incorporation of 
the transgenes in the plastid genome rather than the 
nuclear genome results in natural transgene containment, 
since plastids are not transmitted via pollen in most 
crops (Maliga, 1993) . Plastid transformation in crops 
has not been widely employed due to the lack of 
technology. Enhanced expression of selective markers 
should yield higher transformation efficiencies. The 
chimeric promoters of the present invention facilitate 
extension of plastid transformation to agronomical ly and 
industrially important crops. Indeed, high-level 
expression from the PrrnLatpB+DBwt cassette described 
here resulted in -25-fold increase in the frequency of 
kanamycin-resistant transplastomic tobacco lines. More 
importantly, high levels of marker gene expression 
following plastid transformation have been obtained in 
rice, the first cereal species in which plastid 
transformation has been successful. The results are set 
forth in Example 8 . 

The following examples are provided to illustrate 
various embodiments of the present invention. They are 
not intended to limit the invention in any way. 

The protocols set forth below are provided to 
facilitate the practice of the present invention. 
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PREPARATION OF CHIMERIC 5 1 CASSETTES FOR ELEVATED 
EXPRESSION OF HETEROLOGOUS PROTEINS IN PLASTIDS OF 

HIGHER PLANTS 

Identification of a potential downstream box in 
plastid xnRNAs 

The presence or absence of downstream box elements 
in mRNA molecules was determined for the following 
genes: psbB (Tanaka et al . , 1987) and psbA (Sugita and 
Sugiura, 1984), photosystem II genes; rbcL, encoding the 
large subunit of ribulose- 1 , 5-bisphosphate 
carboxylase/oxygenase (Shinozaki and Sugiura, 1982) ; 
atpB, encoding the ATPase B subunit (Orozco et al., 
1990) ; and clpP, encoding the proteolytic subunit of the 
Clp ATP-dependent plastid protease (Hajdukiewicz et 
al., 1997). Interestingly, most or all of the PclpP-53 
promoter is downstream of the transcription initiation 
site, therefore the PrmLclpP constructs are assumed to 
contain two promoters : Prrn-114 and PclpP-53. 
Transcription initiation sites for these genes were 
described in references cited above; for nucleotide 
position of the genes in the plastid genome see 
Shinozaki et al • , 1986. 

Initially, it was assumed that the plastid ADB is 
similar in size and position as the E. coli ADB in the 
16S rRNA. The E. coli ADB is localized on a conserved 
stem structure between nucleotides 1469 to 14 83 {15 nt) 
that corresponds to nucleotides 1416 and 143 C of the 
plastid 16S rRNA (Dams et al., 1988; Sprengart et ai., 
1996) . Although in both cases, the ADB is contained in 
the 16S rRNA penultimate stem, the actual ADB sequence 
is different in plastids and in £. coli (Figure IB) . 
The N-terminal coding regions of plastid genes atpB, 
clpP t rbcLr petA, psaA, psbA, psbB, psbD and psbE were 
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searched for potential DB sequences. The homology search 
was carried out with a 26 nucleotide sequence centered 
on the tentative DB region (Figure IB) . The search 
revealed short stretches of imperfect homology with 
5 alternative solutions. Since the position of DB in the 
mRNA is quite flexible (Etchegaray and Inouye, 1999) , we 
show four potential DB-ADB interactions for atpB and 
rbcL in Figure 2A. Two plastid mRNAs were selected to 
test the role of DB in the translation of plastid mRNAs: 

10 1) atpB mRNA lacks a SD sequence; and 2) rjbcL mRNA 

contains a SD sequence at the prokaryotic consensus. In 
addition, the phage T7 gene 10 (T7gl0) leader was 
included in the study. This leader has a well- 
characterized E. coli DB sequence (Figure 2B; Sprengart 

15 et al. f 1996). Additional plastid mRNAs with potential 

DB sequences shown in Figure 2A are clpP, psbB and psbA. 

Experimental strategy to test the efficiency of leader 
sequences for translation 

20 

To compare the efficiency of translation from the 
5'-UTR of the selected genes, the 5 ' -UTR was cloned 
downstream of the strong plastid rRNA operon a 70 - type 
promoter (Prrn-114) (Svab and Maliga, 1993; Allison et 

25 al., 1996), which initiates transcription from multiple 
adjacent nucleotides (-114, -113, -111; Sriraman et al . , 
1998) . The promoter fragments were constructed as Sa'cl- 
Nhel or a Sacl-Ncol fragments. Construction of the 
chimeric promoters using conventional molecular 

30 biological techniques is set forth in detail in the next 
section. 

Two constructs were prepared for each 5 1 -UTR 
selected: one with (+DB) and one without ( -DB) a native 
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downstream box. It will be obvious from the forthcoming 
discussion, that the -DB constructs have a synthetic DB 
provided by the Nhel restriction site. The promoters 
were cloned upstream of the coding region of a kanamycin 
5 resistance (neo) gene, which is available on an Nhel- 

Xbal or Ncol-Xbal fragment. For the stabilization of the 
mRNA, the rbcL gene 3 ' -untranslated region was cloned 
downstream of neo as an Xbal -Hindi II fragment. The 
chimeric neo genes can therefore be excised from the 
10 pUC118 or pBSIIKS+ plasmids as Sacl-Hindlll fragments. 
These source plasmids are listed in Table 1. 



Table 1. Salient features of chimeric promoters 8 - 



35 



Source of 5 ' -UTR SD 
(nucleotides from AUG) 


DB 


Promoter 
fragment 


pUC118(U) or pPRVlllA,B 
pBSIIKS 4- (B) 


atpB (-90/+42) 


wt 


Sad /Nhel 


pHK10{U) 


pHK30 (B) 


atpB (-90/+6) 


s 


Sacl/Nhel 


pHKll(U) 


pHK31 (B) 


atpB (-90/42) 


m 


Sacl/Nhel 


pHK50 (B) 


pHK60(B) 


clpP (-53/+4B) 


wt 


Sacl/Nhel 


pHK12 (U) 


pHK32 (B) 


clpP (-S3/+6) 


6 


Sacl/Nhel 


pHK13 (U) 


pHK33 (B} 


rbcL (-S8/+42) + 


Wt 


Sacl/Nhel 


pHK14 (B) 


pHK34 (A) 


rbcL (-B8/+6) + 


S 


Sacl/Nhel 


pHK15 (U). 


pHK3 5 (A) 


rbcL (-58/+42J + 


m 


Sacl/Nhel 


pHK54 (B) 


PHK64 (A) 


psbB (-54/+45) + 


wt 


Sacl/Nhel d 


pHKIS (U) 


pHK3 6(A) 


psbB (-54/+3) + 


s 


SacI/NcoI a 


PHK17 (U) 


pHK3 7(A) 


b T7glO+DB/Ec (-63/+24)+ 


Ec 


Sacl/Nhel 


pHK18 (B) 


pHK38(A) 


b T7glO+DB/pt (-63/+24)+ 


Pt 


Sacl/Nhel 


pHK19 (B) 


pHK3 9(A) 


T7glO-DB (-63/+9) + 


s 


Sacl/Nhel 


pHK20 (B) 


pKK4 0(A) 


psbA (-85/+21) 


wt 


Sacl/Nhel 


pHK21 (U) 


pHK4 1 (A) 


psbA <-85/+3> 


s 


Sacl/NcoI e 


pHK22 (U) 


pHK42 (A) 


c psbA(+GC) (-B5/+3) 




sSad/NcoI e 


pHK2 3 (U)' 


pKK4 3 (A) 



4 0 a SD+ / SD at prokaryotic consensus position; SD- , no SD at prokaryotic 
consensus position; 

DB wt, wild-type; m, mutants; s, Nhel site as synthetic DB. 
b Ec or pt refers to construct with E. coli or plastid DB sequence. 
c psbA(+GC) indicates addition of GC to the wild-type A at the mRNA 5' -end. 
45 d ln source gene psbB translation initiation codon is within Ncol site; 
therefor +DB construct pHK16 has this Ncol site upstream of the Nhel 
site; see Figure 9. 

translation initiation codon is included in Ncol site; Nhel site is 
directly downstream in kan coding region; see Figure 8. 

50 
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The Prrn promoter fragment is available in plasmid 
pPRVlOOA (Zoubenko et al . , 1994), The promoters were 
designed to include sequences between -197 nt and -114 
5 nt upstream of the mature 16S rRNA 5' end. Nucleotide - 
197 is the 5' -end of the Prrn promoter constructs 
utilized for these and other studies (Svab and Maliga. 
1993; -1 is the first nucleotide upstream of the mature 
16S rRNA) . The G at the -114 position is one of three 

10 transcription initiation sites; the other two are the 
adjacent C (-113) and A (-111) nucleotides (Allison et 
al., 1996, Sriraman et al . , 1998). The nucleotide at 
which Prrn transcription would initiate is marked by a 
filled circle in Figure 3A-D. In most constructs, this 

15 is a G (-114) as in the native promoter. In two 

constructs the G was replaced by an A, as in the psbA 
promoter which is the source of the leader sequence 
4pHK21, pHK22; see below) . 

20 DESIGN OP THE 5 1 LEADER FROM atpB 

For the atpB gene, multiple mRNA 5 '-ends were 
mapped in tobacco leaves including at least four primary 
transcripts indicating transcription from four promoters 
and a processed 5 ' -end 90 nucleotides upstream of the 

25 translation initiation codon (Orozco et al., 1990). The 
terminal nucleotide of the processed atpB 5 ' -end is a G. 
Therefore, the chimeric PrrnLatpB promoters were 
designed to initiate transcription at a G, anticipating 
that the leader sequence of the chimeric transcript will 

30 be a perfect reproduction of the processed atpB mRNA 5'- 
end. Out of the atpB coding region, 42 and 6 
nucleotides are included in the +DBwt and -DB 
constructs, respectively. The 42 nucleotides include 
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four potential DB sequences shown in Figure 2A. Two 
point mutations in the leader sequence were designed to 
eliminate Nhel (T to A) and EcoRI (G to A) restriction 
sites without affecting the predicted mRNA 5 1 secondary 
structure. In the -DB constructs, two codons (6 
nucleotides) were retained from the native coding region 
upstream of the Nhel restriction site (GCTAGC sequence) 
in which the stop codon is out -of -frame (Figure 3A) . 
Eleven silent point mutations were introduced in the DB 
region of the PrrnLatpB+DBm construct to either minimize 
the number of base pairs, or to change the nature of 
base pairing (for example G-C to G-U) (Figure 2A; Figure 
3A) . 

DESIGN OF THE 5' LEADER FROM clpP 

Two major mRNA 5 1 -ends of the clpP gene were mapped 
in tobacco leaves (Hajdukiewicz et al., 1997). The 
terminal nucleotide of the proximal primary transcript 
is a G. Therefore, the chimeric PrrnLclpP promoters were 
designed to initiate transcription at a G, anticipating 
that the leader sequence of the chimeric transcript will 
be a perfect reproduction of the leader transcribed from 
the Pclp-53 NEP promoter. Out of the clpP coding region, 
48 and 6 nucleotides are retained in the +DBwt and -DB 
constructs, respectively. The 48 nucleotides include 
four potential DB sequences as shown in Figure 2A. In 
the -DB constructs, two codons (6 nucleotides) were 
retained from the native coding region upstream of the 
Nhel restriction site (GCTAGC sequence) in which the 
stop codon is out-of -frame . 
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DESIGN OF THE 5« LEADER FROM rbcL 

One primary and one processed mRNA 5 '-end were 
mapped in tobacco leaves for the rbcL gene (Shinozaki 
and Sugiura, 1982) . The terminal nucleotide of the 
processed 5 f end is a T. The chimeric PrrnLrbcL 
promoters were designed to initiate transcription at a 
G, one nucleotide downstream of the terminal T. Forty- 
two and 6 nucleotides out of the rjbcL coding region are 
included in the +DB and -DB constructs, respectively. 
The 42 nucleotides include four potential DB sequences 
as shown in Figure 2A. The one point mutation (G to A) 
in the leader sequence was designed to eliminate an 
EcoRI restriction site without affecting the predicted 
mRNA 5' secondary structure. In the -DB constructs, two 
codons (6 nucleotides) were retained from the native 
coding region upstream of the Nhel restriction site 
(GCTAGC sequence) in which the stop codon is out- of - 
frame. Twelve silent point mutations were introduced 
into the DB region of the PrrnLrbcL+DBm construct to 
either minimize the number of base pairs, or to change 
the nature of base pairing (for example G-C to G-U) 
(Figure 2A, Figure 3B) . 

DESIGN OF THE 5 1 LEADER FROM psbB 

One primary and one processed mRNA 5' -end for the 
psbB gene were tentatively identified in tobacco leaves 
(Tanaka et al . , 1987). The leader sequence was designed 
to initiate transcription from the G (-114) of the Prrn 
promoter, and include the intact secondary (stem) 
structure assumed to be involved in stabilizing the 
mRNA. Forty- five and 3 nucleotides out of the psJbB 
coding region are included in the +DB and -DB 
constructs, respectively. The 4 5 nucleotides include 
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four potential DB sequences shown in Figure 2A. Since 
the ATG is naturally included in an Ncol site that is 
used to fuse the neo coding region with the psbB leader, 
no amino acid from the psbB coding region is added in 
5 the -DB construct. 

DESIGN OF THE 5 1 LEADER FROM psbA 

One mRNA 5' -end was mapped for the psbA gene in 
tobacco leaves {Sugita and Sugiura, 1984) . The terminal 

10 nucleotide of the primary transcript is an A. Therefore, 
the chimeric PrrnLpsbA promoters were designed to 
initiate transcription at an A, anticipating that the 
leader sequence of the chimeric transcript will be a 
perfect reproduction of the leader transcribed from the 

15 psbA promoter. Twenty-one and 3 nucleotides out of the 
psbA coding region are included in the +DB and -DB 
constructs, respectively. The 21 nucleotides include the 
potential DB sequence as shown in Figure 2A. Since the 
neo coding region was linked to the chimeric promoter 

20 via an Ncol site which includes the translation 

initiation codon (ATG) , no amino acid from the psbA 
coding region is added in the -DB constructs. This is 
true of a second -DB promoter, in plasmid PHK23, in 
which transcription is designed to initiate from the 

25 Prrn G (-114) and C (-113) (Figure 3C) . 

DESIGN OF THE T7 PHAGE GENE 10 LEADER 

The T7 phage gene 10 leader (63 nucleotides) was 
shown to promote efficient translation initiation in E. 
30 coli (Olins et al . , 1988). This leader is used in the E. 
coli pET expression vectors (Studier et al., 1990; 
Novagen Inc.). The terminal nucleotide at the 5' -end is 
a G. Therefore, the chimeric PrrnT7glOL promoters were 
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designed to initiate transcription at a G, anticipating 
that the leader sequence of the chimeric transcript will 
be a reproduction of the T7 phage gene 10 mRNA, with the 
exception of a T to A mutation which was introduced to 
5 eliminate an Xbal site. Twenty- four and 9 nucleotides 

from the T7 phage gene 10 coding region are included in 
the +DB/Ec (with E. coli DB sequence) and -DB 
constructs, respectively. To compare the efficiency of 
E. coli and plastid DB sequences in plastids, a second 
10 +DB promoter was constructed with the tobacco DB 

sequence (PrrnT7glOL+DB/pt ) . The native T7gl0 leader has 
an Nhel site directly downstream of the translation 
initiation codon. This Nhel site was removed by a T to 
A point mutation in the +DB constructs (Figure 3D) . 

15 

For introduction into the plastid genome, the 
chimeric neo genes were cloned into plastid 
transformation vector pPRVlllA or pPRVlllB. See U.S. 
Patent 5,877,402, the disclosure of which is 

20 incorporated herein by reference. The pPRVlll vectors 

target insertions into the inverted repeat region of the 
tobacco plastid genome, and carry a selectable 
spectinomcyin (aadA) resistance gene. The sequences of 
the vectors have been deposited in GenBank (U12812, 

25 U12813) . The chimeric neo gene in vector pPRVlllB is in 
tandem with the aadA gene, whereas in vector pPRVlllA 
the' chimeric neo is oriented divergently. The general 
outline of the plastid transf ormation vector with the 
chimeric neo genes is shown in Figures 4A and 4B. 

30 
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CONSTRUCTION OF CHIMERIC Prnn PROMOTERS WITH PLASTID 
MRNA LEADERS 

The chimeric Prrn promoter/leader fragments were 
constructed as a Sacl-Nhel or Sacl-Ncol fragments (Table 
1, below) by overlap extension PCR (SOE-PCR) , 
essentially as described in Lefebvre et al., (1995). 
Construction of the Prm-plastid leader segments is 
schematically shown in Figure 5. The objective of the 
PCR-1 step is to 1) amplify the Prrn promoter fragment 
while 2) adding a SacI site upstream and a seam-less 
overlap with the specific downstream leader sequence . 
The reaction contains: 1) a primer (oligonucleotide) to 
add a SacI site at the 5' -end of the fragment; 2) a 
suitable template containing the Prrn promoter sequence 
in plasmid pPRVlOOA (Zoubenko et al . , 1994); and 3) a 
primer to add on the overlap with the leader sequence at 
the 3 f of the amplified product- The objective of the 
PCR-2 step is to create the chimeric promoter with DB 
sequence using: 1) the product of PCR-1 step as a 
primer; 2) a suitable DNA template containing the 
specific leader sequence; and 3) primer 
(oligonucleotide) to include Nhel restriction site at 
the 3 '-end of the amplification product. The product of 
the PCR-2 is the SacI -Nhel chimeric Prrn promoter 
fragment with DB sequence. The objective of the PCR-3 
step is to remove the DB sequence while introducing a 
suitable Nhel or Ncol restriction site. The product of 
PCR-3 is the Sacl-Nhel or Sacl-Ncol chimeric Prrn 
promoter fragment in which the DB sequence is replaced 
with the Nhel site. The objective of the PCR-4 step is 
to replace the wild- type DB with a mutant DB. The 
product of PCR-4 is a Sacl-Nhel Prrn promoter fragment. 
The primers (oligonucleotides) used for the 



-31- 



WO 00/07431 



PCT/US99/17806 



construction of chimeric promoters are listed in Table 
2. The chimeric promoters were obtained by overlap 
extension PCR using oligonucleotides and DNA templates 
schematically shown in Figure 5. 

5 

Table 2. 

Oligonucleotides used for the construction of chimeric 

promoters. 

10 #1:5*- CCCGAGCTCGCTCCCCCGCCGTCGTTC - 3 ' 

#2: 5'- 

CGAATTTAAAATAAATGTCCGCTTGCACGTCGATCGGTTAATTCTCCCAGAAATATAGCCATCC- 3 ' 

15 #3: 5'-CCCGCTAGCCGTGGAAACCCCAGAACC-3' 

#4 : 5 ' -CCCGCTAGCTCTCATAATAATAAAATAAATAAATATGTC-3 1 
#5:5' - TCACTTTGAGGTGGAAACGTAACTCCCAGAAATATAGCCATCC - 3 1 

20 

#6: 5'-CCCGCTAGCTTCCTCTCCAGGACTTCG-3' 
#7: 5 1 -CCCGCTAGCAGGCATTAAATGAAAGAAAGAAC-3 1 
25 #8: 5 1 -TAAGAATTTTCACAACAACAAGGTCTACTCGACTCCCAGAAATATAGCCATCC-'3 ' 

#9 : 5 ■ -CCCGCTAGCTTTGAATCCAACACTTGCTTTAG- 3 1 
#10: 5'-CCCGCTAGCTGACATAAATCCCTCCCTAC-3 , 

30 

#11 : 5 1 -CAAAGATAAATAGACACTACGTAACTTTATTGCATTGCTCCCAGAAATATAGCCATCC- 
3' 

#12 : 5 ' - CCCGCTAGCATCATTCAATACAACGGTATGAACACG- 3 1 

35 

#13 : 5 ' -1TCTAGTGGGAAACCGTTGTGGTCTCCCTCCCAGAAATATAGCCATCC-3 » 
#14 : 5 • - CCCGCTAGCCATATGTATATCTCCTTCTTAAAG- 3 1 
4 0 #15: 5 , -CCCGCTAGCCTGTCCACCAGTCATGCTTGCCATA~3 , 

#16 : 5 ■ - CCCGCTAGCCAAGGCAGGGCTAGTGATTGCCATATGTATATCTCCTTC - 3 • 
#17: 5 1 -TTTGTTTAACTTTAAGAAGGAGATATACATATGGCAAGCATGACTGGTGG - 3 1 

45 

#18 : 5 1 - CTCCTTCTTAAAGTTAAACAAAATTATTTCTAGTGGGAAACCGTTGT- 3 • 
#19 : 5 1 -CAAAATAGAAAATGGAAGGCTTTTTGCTCCCAGAAATATAGCCATCCC - 3 ' 
50 #20:5'- CAAAATAGAAAATGGAAGGCTTTTTTCCCAGAAATATAGCCATCCC - 3 

#21: 5 ' -GGGCCATGGTAAAATCTTGGTTTATTTAATC- 3 ' 
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#22: 5 1 -GGGGCTAGCTCTCTCTAAAATTGCAGT-3 1 

#23: 5 1 -GAATAGCCTCTCCACCCA-3 1 

#24 : 5 ' - CCCGCTAGCCGTGGACACCCCACTTCCACTTGTTGTCGGGTTTATTCTCAT - 3 1 

#25 : 5 1 ~ CCCGCTAGCTTTGAATCCTACTGAGGCTTTTGTTTCTGTTTGAGGACTCAT- 3 1 

CONSTRUCTION OF CHIMERIC Prnn PROMOTER/ a tpB LEADER 
SEGMENTS 

PrrnLatpB+DBwt in plasmid pHKlO (Product of PCR-2) 
PrrnLatpB-DB in plasmid pHKll (Product of PCR-3) 
PrrnLatpB+DBm in plasmid pHK50 (Product of PCR-4) 
PCR-1: Oligonucleotides #1, #2 as primers; plasmid 
pPRVlOOA (Zoubenko et al . , 1994) DNA as template. 
PCR-2: Product of PCR-1 step, Oligonucleotide #3 as 
primers; plasmid pIK79 (see below) DNA as template. 
PCR-3: Oligonucleotide #1, #4 as primers; Product of 
PCR-2 step as template. 

PCR-4: Oligonucleotide #1, #24 as primers; Product of 
PCR-2 step as template. 

Plasmid pIK79 is a Bluescript BS+ phagemid derivative 
which carries a PvuII/XhoI tobacco plastid DNA fragment 
between nucleotides 55147-50484 containing the rbcL- 
atpB intergenic region with divergent promoters for 
these genes (Shinozaki et al., 1986). 

CONSTRUCTION OF CHIMERIC Prnn PROMOTER/clpP LEADER 
SEGMENTS 

PrrnLclpP+DBwt in plasmid pHK12 (Product of PCR-2) 
PrrnLclpP-DB in plasmid pHK13 (Product of PCR-3) 
PCR-1: Oligonucleotides #1, #5 as primers; plasmid 
pPRVlOOA (Zoubenko et al . , 1994) DNA as template. 
PCR-2: Product of PCR-1 step, Oligo #6 as primers; 
tobacco Sal8 ptDNA fragment (Shinozaki et al . , 1986) as 
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template. 

PCR-3: Oligonucleotide #1, #7 as primers; Product of 
PCR-2 step as template. 

CONSTRUCTION OF CHIMERIC Prnn PROMOTER/ rbcL LEADER 
SEGMENTS 

PrrnLrbcL+DBwt in plasmid pHK14 (Product of PCR-2) 
PrrnLrbcL-DB in plasmid pHK15 (Product of PCR-3) 
PrrnLrbcL+DBm in plasmid pHK54 (Product of PCR-4) 
PCR-1: Oligonucleotides #1, #8 as primers; plasmid 
pPRVlOOA (Zoubenko et al . , 1994) DNA as template. 
PCR-2: Product of PCR-1 step, Oligonucleotide #9 as 
primers; plasmid pIK79 DNA (see description of pHKlO 
above) as template. 

PCR-3: Oligonucleotide #1, #10 as primers; Product of 
PCR-2 step as template. 

PCR-4: Oligonucleotide #1, #25 as primers; Product of 
PCR-2 step as template. 

CONSTRUCTION OF CHIMERIC Prnn PROMOTER/psbB LEADER 
SEGMENTS 

PrrnLpsbB+DBwt in plasmid pHK16 (Product of PCR-2) 
PrrnLpsbB-DB in plasmid pHK17 (Promoter from pHK16, 
digested with Sacl/Ncol) 

PCR-1: Oligonucleotides #1, #11 as primers; plasmid 
pPRVlOOA (Zoubenko et al., 1994) DNA as template, 
PCR-2: Product of PCR-1 step, Oligo #12 as primers; 
tobacco Sal8 ptDNA fragment (Shinozaki et al . , 1986) as 
template . 

PCR-3 was not necessary, since the psbB translation 
initiation codon is naturally included in an Ncol site. 
Therefore, the -DB derivative could be obtained by 
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CONSTRUCTION OF CHIMERIC Prnn PROMOTER/ p sbA LEADER 
SEGMENTS 

5 PrrnLpsbA+DBwt in plasmid pHK21 (Product of PCR-2) 
PrrnLpsbA -DB in plasmid pHK22 (Product of PCR-3) 
PCR-1: Oligonucleotides #1, #2 0 as primers; plasmid 
pPRVlOOA (Zoubenko et al . , 1994) DNA as template. 
PCR-2: Product of PCR-1 step, Oligo #22 as primers; 
10 tobacco Sal3 ptDNA fragment (Shinozaki et al., 1986) as 
template. 

PCR-3 : Oligonucleotide #1, #21 as primers; Product of 
PCR-2 step as template. 

15 PrrnLpsbA (GC) -DB in plasmid pHK23 (Product of PCR-2) 
PCR-1: Oligonucleotides #1, #19 as primers; plasmid 
pPRVlOOA (Zoubenko et al . , 1994) DNA as template. 
PCR-2: Product of PCR-1 step, Oligo #21 as primers; 
tobacco Sal3 ptDNA fragment (Shinozaki et al . , 1986) as 

20 template. 

In all of the above, PCR amplification was carried 
out with AmpliTaq DNA polymerase (Perkin Elmer) or Pfu 
DNA polymerase (Stratagene) and "stepdown" PCR that 

25 utilizes gradually decreasing annealing temperatures was 
performed (Hecker and Roux, 1996) . The exact 
amplification conditions for the chimeric Prrn: :LatpB 
promoters are given below. The amplification conditions 
for the remaining chimeric Prrn - plastid leader 

3 0 promoters were calculated according to Hecker and Roux 
(1996), and differ only in the annealing temperatures. 
Description of PCR conditions for the construction of 
the chimeric Prrn promoters with plastid mRNA leaders is 
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given below; for interpretation of individual steps see 
scheme in Figure 5. 





PGR 


-1 Program: 


50 picomoles of both primers per 100 ill 




1.1 


Denature 


5 min. at 94 °C 






2.1 


Denature 


1 min. at 94 °C 






2.2 


Annealing 


0.5 min. at 72 °C 


3 cycles 


10 


2.3 


Extension 


0.5 min. at 72 °C 






3.1 


Denature 


1 min. at 94 °C 






3.2 


Annealing 


0.5 min. at 69 °C 


3 cycles 




3.3 


Extension 


0.5 min. at 72 °C 






4.1 


Denature 


1 min. at 94 °C 




15 


4 .2 Annealing 


0.5 min. at 66 °C 


3 cycles 




4.3 


Extension 


0.5 min. at 72 °C 






5.1 


Denature 


1 min at- 94 °C 






5.2 


Annealing 


0.5 min. at 63 °C 


3 cycles 




5.3 


Extension 


0.5 min. at 72 °C 




20 


6.1 


Denature 


1 min. at 94 °C 






6.2 


Annealing 


0.5 min. at 60 °C 


3 cycles 




6.3 


Extension 


0.5 min. at 72 °C 






7.1 


Denature 


1 min. at 94 °C 






7.2 


Annealing 


0.5 min. at 57 °C 


20 cycles 


25 


7.3 


Extension 


0.5 min. at 72 °C 






8.1 


Extension 


10 min. at 72 °C 






8.2 




1 min. at 3 0 °C 





The PCR-2 program was essentially identical to the PCR1 
30 program set forth above with the following 

modifications: 1) Primers in 100 /il were the products of 
1st PCR reaction, 50 picomoles of the oligonucleotide 
primer were used; and 2) the annealing temperature in 
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stepdown PCR was from 67 °C to 52 °C. Accordingly, the 
following annealing temperatures were used: Step 2.2, 67 
°C; Step 3.2, 64 °C; Step 4.2, 61 °C; Step 5.2, 58 °C; 
Step 6.2, 55 °C; Step 7.2, 52 °C. 

The PCR-3 and PCR-4 programs were essentially identical 
to the PCR1 program with the following modification: 
1) The annealing temperature in stepdown PCR was from 69 
°C to 44 °C. Accordingly, the following annealing 
temperatures were used: Step 2.2, 69 °C; Step 3.2, 64 
°C; Step 4.2, 59 °C; Step 5.2, 54 °C; Step 6.2, 49 °C; 
Step 7.2, 44 °C. In cases where the yield of the final 
PCR reaction was too low for efficient cloning, final 
product was amplified using primers which were used to 
generate the ends. The final PCR products were digested 
with the appropriate restriction enzymes (SacI and Nhel 
or SacI and Ncol) and cloned in plasmids pHK2 or pHK3 
(see below) . 

CONSTRUCTION OF CHIMERIC PROMOTERS WITH T7 PHAGE GENE 10 
znRNA LEADER SEGMENT 

The chimeric Prrn promoter/T7genelO leader 
(PrrnLT7gl0) fragments were constructed as SacI -Nhel 
fragments (Table 1, below) . 

PrrnLT7gl 0 +DB/EC promoter in olasmid pHK18 
In the absence of a proper DNA template, the 
PrrnLT7gl0+DB/Ec was constructed by employing a modified 
polymerase chain reaction (Uchida, 1992) in two PCR 
steps, as schematically shown in Figure 6. The PCR-1A 
and PCR1B steps generate two fragments in two separate 
reactions (A and B) . The objective of the PCR-1A step is 
to amplify Prrn promoter fragment while: 1) adding a 
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SacI site upstream (Oligonucleotide #1 in Table 2) ; and 
2) a seam- less overlap with the specific downstream 
leader sequence (Oligonucleotide #13 in Table. 2) using 
plasmid pPRVlOOA (Zoubenko et al . , 1994) as DNA 
template- The objective of the PCR- IB step is to 
amplify part of the T7gl0 leader sequence using 
overlapping oligonucleotides #15 and #17 in Table 2. The 
Nhel site is introduced in oligonucleotide #15. Both 
PCR-lA and PCR- IB reactions were carried out by stepdown 
PCR as described above for the construction of the 
chimeric Prrn promoters. 

PCR-2 reaction generating this chimeric promoter 
contained: 

a) The products of the PCR-lA and PCR- IB reactions as 
DNA templates; 

b) Oligonucleotide #18 (0.5 pi comole; Table 2) to 
generate overlapping fragments with products of the PCR- 
1A and PCR -IB reactions; 

c) Oligonucleotides #1 and #15 {Table 2) for 
amplification of the final product, 50 picomoles each in 
100 \xl final volume. 

Promoter was amplified by stepdown PCR, as 
described for the chimeric Prrn promoters above; the 
annealing temperatures were between 72 °C to 57 °C. 

PrrnLT7glO+DB/pt promoter' in plasmid pHK19 

The promoter fragment was obtained in one PCR step as 

shown in Figure 7. The reaction contained: 

a) The product of the PCR-2 reaction generating promoter 
PrrnLT7glO+DB/Ec in plasmid pHK18 as DNA template; and 

b) Oligonucleotides #1 and #16 (Table 2), 50 picomoles 
each in 100 pi final volume. 
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Promoter was amplified by stepdown PGR, as 
described for the construction of chimeric Prrn 
promoters above; the annealing temperatures were between 
72 °C to 52 °C. 

PrrnLT7alQ-DB promoter in plasmid PHK2 0 
The promoter fragment was obtained in one PCR step, 
which is similar to the PCR- 3 step in Figure 5. The 
reaction contained: 

a) The product of the PCR-2 reaction generating promoter 
PrrnLT7glO+DB/Ec in plasmid pHK18 as DNA template; and 

b) Oligonucleotides #1 and #14 (Table 2), 50 picomoles 
each in 100 /xl final volume. 

Promoter was amplified by stepdown PCR, as 
described for the chimeric Prrn promoters above; the 
annealing temperatures were between 72 °C to 52 °C. 

The final PCR products were digested with the Sad 
and Nhel restriction enzymes and cloned in plasmid pHK3 
20 to obtain plasmids pHK18, pHK19, pHK20. 

Construction of chimeric neo genes 

Construction of the chimeric promoters was 
described in the preceding sections. For determining 

25 effects on levels of protein accumulation, the promoters 
were cloned upstream of a kanamyc in- resistance encoding 
construct, consisting of the neo coding region and the 
3'-UTR of the plastid rJbcL gene. Such constructs are 
available in plasmids pHK2 and pHK3 , which carry the 

30 same Prrn (L) rbcL (S) : meo: : TrbcL gene as a Sacl-Hindlll 
fragment. Plasmid pHK2 is a pUCH8 vector derivative; 
pHK3 is a pBSIIKS+ derivative. Plasmid maps with 
relevant restriction sites are shown in Figure 8. DNA 



10 
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sequence of the neo gene in plasmids pHK2 and pHK3 is 
shown in Figure 9. Note, that in plasmid pHK2 the neo 
gene has an EcoRI site upstream of the Sad site (Figure 
8) . Prrn and TrbcL have been described by Staub and 
5 Maliga, 1994; the neo gene derives from plasmid pSCl 
(Chaudhuri and Maliga, 1996) . The pUC118 and pBSIIKS+ 
plasmid derivatives which carry the various promoter 
constructs are listed in Table 1. 

To determine the DNA sequence of the promoter 

10 fragments, the plasmids were purified with the QIAGEN 
Plasmid Purification Kit following the manufacturer's 
recommendations. DNA sequencing was carried out using a 
T7 DNA sequencing kit (version 2.0 DNA , Amersham Cat. 
No. US70770) and primer No. #23 in Table 2, which is 

15 complementary to the neo coding sequence. These promoter 
sequences are shown in Figure 3A-D. 

Introduction of chimeric neo genes into the tobacco 
plastid genome 

20 Suitable vectors are available for the introduction 

of foreign genes into the tobacco plastid genome. Such 
vectors are pPRVlllA and pPRVlllB, which carry a 
selectable spectinomycin-resistance (aadA) gene and 
target insertions into the repeated region of the 

25 plastid genome (Zoubenko et al., 1994). The chimeric neo 
genes were cloned into one of these plastid 
transformation vectors (Table 1) and introduced into the 
tobacco plastid genome by the biolistic process. From 
the transformed cells plants were regenerated by 

30 standard protocols (Svab and Maliga, 1993) . A uniform 
population of transformed plastid genome copies was 
confirmed by Southern analysis. 

For Southern analysis, total cellular DNA was 
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prepared by the CTAB method (Saghai-Maroof et al., 
1984) . Two leaves of each transformed plant were 
homogenized and incubated at 60 °C for 30 minutes in a 
buffer containing 2% CTAB (tetradecyl-trimethyl -ammonium 
5 bromide), 1.4 M NaCl, 20 mM EDTA (pH 8.0), 1 mM Tris/HCl 
(pH 8.0) and 100 mM p-mercaptoethanol . After chloroform 
extraction, the DNA was precipitated with isopropyl 
alcohol and dissolved in water or in TE buffer (10 mM 
Tris, 1 mM EDTA, pH 8.0) . DNA digested with an 

10 appropriate restriction enzyme was electrophoresed on 

0.8% agarose gel and transferred to nylon membrane using 
PosiBlot Transfer apparatus (Stratagene) . The blots were 
probed using Rapid Hybridization Buffer and plastid 
targeting sequences as a probe labeled with random 

15 priming ( 35 P, Boehringer Mannheim Cat No. 1004760) . 

Plastid transformation was achieved with each of 
the plasmids listed in Table 1. Exceptions were plasmids 
pHK41 and pHK42 . It appears that NPTII expression with 
the psbA leader derivatives was so high that the plants 

20 were not viable. It follows that these same leaders may 
be used to advantage when fused with weaker promoters. 

Transplastomic lines are designated by Nt (N. 
tabacum, the species) , the plasmid name (for example 
pHK30) and an individual line number and a letter 

25 identifying regenerated plants. For example, the Nt- 
pHK3 0-lD and Nt-pHK30-lC plants were both obtained by 
transformation with plasmid pHK30, are derived from the 
same transformation event and were regenerated from the 
same culture. Nt-pHK30-2 plants are derived from an 

30 independent transformation event. Normally, several 

transformed lines per construct were obtained. However, 
data are shown here only for one: Nt-pHK30-lD, Nt-pHK31- 
1C, Nt-pHK60-5A, Nt-pHK32-2F, Nt-pHK33-2A, Nt-pHK34-9C, 



-41- 



WO 00/07431 



PCTVUS99/17806 



Nt-pHK35-4A, Nt-pHK64-3A, Nt-pHK3 6-lC, Nt-pHK37-2D, Nt- 
pHK38-2E, Nt-pHK39-3B, Nt-pHK40-12B and Nt-pHK43-lC. 

5 Testing mRNA accumulation by RNA gel blot (Northern) 
analysis 

RNA gel blot analysis was performed to determine 
steady- state levels of chimeric mRNA in the 
transplastomic lines. Total leaf RNA was prepared from 

10 the leaves and roots of plants grown in sterile culture 
according to Stiekema et al (1988) . RNA (4 /zg per lane) 
was electrophoresed on 1% agarose gel and transferred to 
nylon membranes using the PosiBlot Transfer apparatus 
(Stratagene) . The blots were probed using Rapid 

15 Hybridization Buffer Amersharn) with a 32 P-labeled neo 

probe (Pharmacia, Ready-To-Go Random Priming Kit) . The 
neo probe was obtained by isolating the Nhel/Xbal 
fragment from plasmid pHK2 . The template for probing 
the tobacco cytoplasmic 25S rRNA was a fragment which 

20 was PCR amplified from total tobacco cellular DNA with 
primers 5 1 -TCACCTGCCGAATCAACTAGC-3 1 and 5'- 
GACTTCCCTTGCCTACATTG- 3 ' . RNA hybridization signals were 
quantified using a Molecular Dynamics Phosphor Imager, 
and normalized to the 25S rRNA signal. 

25 

Testing NPTII accumulation by protein gel blot (Western) 
analysis 

Total soluble protein was extracted from the 
leaves, roots or seeds of transgenic tobacco plants 
3 0 grown in sterile culture. In case of leaves grown in 
sterile culture, about 200 mg leaf tissue was 
homogenized in 1 ml of buffer containing 5 0 mM Hepes/KOH 
(pH 7.5), 1 mM EDTA, 10 mM potassium acetate, 5 mM 
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magnesium acetate, 1 mM dithiothreitol and 2 mM PMSF. 
The homogenate was centrifuged twice at 4 °C to remove 
insoluble material. Protein concentration was -determined 
using the Biorad Protein Assay reagent kit. Transgenic 
tobacco plants expressing neo in the plastid genome 
(Nt-pTNH32-70, Carrer et al . , 1993) and wild type plants 
were used as positive and negative controls, 
respectively. Proteins were separated in SDS 
polyacrylamide gels (SDS-PAGE; 15% acrylamide, 6 M urea) 
and transferred to nitrocellulose membranes using a 
semi -dry transfer apparatus (Bio-Rad) . After blocking 
non-specific binding sites, the membrane was incubated 
with 4, 000 -fold diluted polyclonal rabbit antiserum 
raised against NPTII (5Prime-3Prime Inc.). HRP- 
conjugated secondary antibody, diluted 20,000 fold, and 
ECL chemiluminescence were used for immunoblot detection 
on X-ray film. NPTII was quantified on the inrmunoblots 
by comparison of the experimental samples with a 
dilution series of commercial NPTII (5Prime-3Prime) . 

EXAMPLE 1 

DB sequences enhance protein accumulation from rbcL 
leader; protein accumulation from the atpB translation 
control signals is high but DB- independent 

The role of DB sequences in mRNA translation was 
tested using neo as the reporter gene. The neo gene 
encodes the bacterial enzyme neomycin phosphotransferase 
(NPTII) (Beck et al., 1982) . The tested neo genes have 
the same promoter (Prrn) and transcription terminator 
(TrbcL) , and differ only with respect to the translation 
control region (TCR) comprising the 5' untranslated 
region of the mRNA and the coding region N- terminus. Two 
constructs were prepared with the atpB and rbcL TCRs. 
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One construct contained the wild- type TCR, including the 
processed 5' untranslated region and 42 nucleotides of 
the coding region N-terminus (PrrnLatpB+DBwt , . plasmid 
pHK30, Figure 4B; PrrnLrbcL+DBwt , plasmid pHK34, Figure 
4A) . The second construct contained silent mutations in 
the 42 -nucleotide segment cf the atpB and rJbcL N- 
terminal coding regions to either eliminate or alter 
mRNA and rRNA base pairing (PrrnLatpB+DBm plasmids 
pHK60, Figure 2A and Figure 4B; PrrnLrbcL+DBm, pHK64 # 
Figure 2A and Figure. 4A) . The silent mutations altered 
the mRNA sequence without effecting the amino acid 
sequence. For example, 13 potential base pairs may form 
between the wild- type atpB mRNA and the ADB sequence 
shown at the bottom in Figure 2A. The 11 silent 
mutations affect eight base-paring events for this 
particular ADB-DB interaction. After mutagenesis, there 
is a possibility for ten base pairing events, most of 
which are new. The chimeric neo genes were introduced 
into the tobacco pi as t id genome by homologous targeting 
using the biolistic approach (Svab and Maliga, 1993; 
Zoubenko et al . , 1994) . NPTII and neo mRNA levels were 
then assessed in the leaves of transplastomic plants. 
Since NPTII in wild-type DB-containing and mutant DB- 
containing plants has the exact same protein sequence, 
protein levels in the plants directly reflect the 
efficiency of mRNA translation. In case of the atpB TCR, 
mutagenesis of DB reduced protein accumulation to ~4% 
instead of -7% (Figure 10 and Table 3) . In contrast, 
mutagenesis of rbcL DB had a dramatic effect, reducing 
NPTII accumulation 35-fold. Thus, DB-ADB interaction is 
very important for translation of the plastid rjbcL mRNA, 
but is less important for translation of the atpB mRNA. 
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We also prepared a third construct set with the 
atpB and rJbcL leaders, but without the native DB 
(PrrnLatpB-DB, plasmid pHK31, Figure 4B; PrrnLrbcL-DB, 
plasmid pHK35, Figure 4A) . The neo coding region in 
5 these constructs is directly linked to the Prrn promoter 
via a synthetic Nhel restriction site. The iVhel 
restriction site (GCTAGC) is fully complementary to the 
ADB region (Figure 2B) , therefore it was hoped that it 
would function as a DB sequence. Utility of Nhel site as 

10 an alternative DB could be best judged by NPTII 

accumulation from the rbcL leader, which is highly 
dependent on DB. High levels of NPTII from the Nhel 
construct (4.7%) relative to the mutant DB (0.3%) 
indicate, that linking the coding region via an Nhel 

15 site provides a suitable DB for expressing foreign 
polypeptides (Figure 10, Table 3) . 

TABLE 3 

Levels of NPTII and neo mRNA In tobacco leaves 
20 SD DB NPTII (%} neo mRNA NPTII/neo mRNA 





Nt-pTNH32-70 


+ 




2.10±0.33 


41.5 


5.06 




Nt-pHK30-lD 


( + ) 


wt 


7.02±0.B2 


70-05+12.33 


8.85 


25 


Nt-pHK31-lC 


( + ) 


3 


2 .52+0.79 


100 


2.52 




Nt-pHK60-5A 


( + ) 


m 


4 .03±1.45 


91.57±12.76 


4.40 




Nt-pHK32-2F 




wt 


1.17+0.05 


49.33±7.76 


2.37 




Nt-pHK33-2A 




s 


'C, 21+0.05 


49.55+6.67 


0.42 


30 
















Nt-pHK34-9C 


+ 


wt 


10.83±3.84 


48.91±22.65 


22.14 




Nt-pHK35-4A 


+ 


9 


4.68+1.84 


21.41+7.88 


21.86 




Nt-pHK64-3A 


+ 


ra 


0.31±0.15 


52.47±4.29 


0.59 


35 


Nt-pHK36-lC 


+ 


wt 


2.17±70.97 


68.8 


3.15 




Nt-pHK37-2D 


+ 


s 


2.35±0.05 


42.3 


5.56 
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Nt-pHK3 8-2E + 

Nt-pHK3 9-3B + 

Nt-pHK4 0-12B + 

Nt-pHK43-lC (+) 



EC 16.39±3.42 

pt 0.16±0.13 

s 23.00±5.40 

S 0.65+0.28 



47.59*19.06 34.44 

13.12±1.27 1.22 

90.27±31.S3 25.48 

13.2 4.92 



DISCUSSION 

In bacteria, mutagenesis or deletion of the DB 
reduces translation 2- to 34-fold, depending on the 
individual mRNA (Etchegaray and Inouye, 1999; Fax6n et 
al., 1991; Ito et al., 1993; Mitta et al . , 1997; 
Sprengart et al., 1996). Furthermore, reliance on the DB 
increases when the SD sequence is removed (Sprengart et 
al., 1996; Wu and Janssen, 1996) . In our experiments, no 
variation was made in the atpB or rbcL 5'UTR, only 
sequences downstream of the AUG were altered. 
Mutagenesis of the atpB DB region reduced protein levels 
~2 -fold. Although the atpB mRNA does not have a SD 
directly upstream of AUG, we speculate that it probably 
has an alternate mechanism for translation initiation 
that reduces its dependence on the DB. Alternatively 
translation initiation may be facilitated by activator 
proteins as described for Chlamydomonas chloroplasts 
(Rochaix, 1996; Stern et al . , 1997). The consequence of 
DB mutagenesis on rbcL translation was a dramatic 35- 
fold drop in NPTII levels. Accordingly, efficient rbcL 
translation is highly dependent on DB-ADB interactions. 
Genes in both prokaryotes and eukaryotes show biases in 
the usage of the 61 amino acid codons and have a tRNA 
population closely matched to the overall codo.n bias of 
the resident mRNA population. Incorporation of 
synonymous minor codons in the coding region may 
dramatically reduce translation (Makrides, 1996) and 
destabilize the mRNA (Deana et al . , 1998). A well- 
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characterized example for minor codons causing reduced 
expression in E. coli are the AGA/AGG arginine codons 
recognized by the same tRNA which are present- at the 
frequency of 2 . 6 and 1.6 per thousand codons. 
5 Therefore, we have compared codon usage bias and 

frequency of triplets per 1000 nucleotides in the wild- 
type and mutagenized atpB and rbcL DB regions. Since we 
studied NPTII accumulation in leaves, the values shown 
in Figure 12 were calculated for the highly expressed 

10 rbcL, psaA, psaB, psaC, psbA, psbB, psbC, psbD, psbE and 
psbF photosynthetic genes using the Genetics Computer 
Group (GCG; Madison Wisconsin) codon frequency program. 
Codon usage bias and triplet frequency is comparable in 
the wild- type and mutant DB regions of both atpB and 

15 rbcL. In addition, the mRNAs for the wild-type and 
mutant DB constructs accumulate at similar levels. 
Therefore, the dramatic change in NPTII acccumulation 
from the PrrnLrbcL+DBm promoter in the Nt-pHK64 line can 
not be attributed to incorporation of a rare codon in 

20 the mutant DB region. 

We have shown here that sequences downstream of the 
translation initiation codon may dramatically affect 
mRNA translation. Therefore, silent mutations in the DB 
region of heterologous proteins may significantly 

25 improve expression in chloroplasts by increasing 
complementarity of the mRNA with the plastid rRNA 
penultimate stem structure. 

There are significant differences in NPTII 
accumulation from neo transgenes with different leaders 

30 and the same synthetic DB (Table 3) . This indicates that 
the 5'UTR is an important determinant of translation 
efficiency. Many data are available supporting the 
importance of 5'UTR as a target for translational 
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control in higher plants (Hirose and Sugiura, 1996; 
Staub and Maliga, 1993; Staub and Maliga, 1994b) and the 
unicellular alga Chlamydomonas (Mayfield et al . , 1994; 
Nickelsen et al . , 1999; Sakamoto et al., 1993; Zerges et 
5 al., 1997) . The data presented herein demonstrate that 
translation efficiency in plastids is determined by 
sequences both upstream and downstream of the AUG. 

10 EXAMPLE 2 

Study of phage T7gl0 translation control sequences 
indicates that the efficient DB in plastids has loose 
complementarity to ADB 

15 Since the actual ADB sequence is different in plastids 
and E. coli, we anticipated (Sprengart et al., 1996; 
Etchegaray & Inoyue # 1999) that replacement of the E. 
coli DB with a perfect plastid DB (100% DB -ADB 
complementarity) would enhance translation in plastids. 

20 We choose the phage T7gl0 translational control region 
for the study since it has a well -characterized E. coli 
DB. Three Prrn promoter derivatives were constructed. 
Cassette PrrnLT7gl0+DB/Ec consists of Prrn fused with 
the native T7gl0 TCR containing the E. coli DB (plasmid 

25 pHK38; Figure 2B, Figure 4A) . Cassette PrrnLT7glO+DB/pt 
consists of the Prrn promoter, T7gl0 leader and the 
perfect tobacco DB (pHK3 9; Figure 2B, Figure 4 A) . 
Cassette PrrnLT7glO-DB has the Prrn promoter and T7gl0 
leader, but lacks the T7gl0 DB sequence (pHK40; Figure 

3 0 2B, Figure 4A) . The neo coding region in these 

constructs is directly linked to the Prrn promoter via a 
synthetic Nhel restriction site. The neo genes in the 
three expression cassettes were introduced into tobacco 



-48- 



WO 00/07431 



PCT/US99/17806 



plastids by transformation (Svab and Maliga, 1993; 
Zoubenko et al., 1994) and the leaves of transplastomic 
tobacco were tested for NPTII accumulation and mRNA 
levels (Figures 10 , 11; Table 3). 
5 Surprisingly, NPTII levels from the heterologous 

T7gl0 TCR were higher (Nt-pHK38; -16%) than the levels 
obtained from the rJbcL TCR (Nt-pHK34; -11%). We expected 
that incorporation of the plastid DB with 100% 
complementarity would further enhance NPTII levels. 

10 Instead, we found that plants transformed with the 
construct having the perfect plastid DB (Nt-pHK3 9) 
contained NPTII levels 100-fold lower than the plants 
expressing NPTII from the E. coli TCR (Nt-pHK38; Figures 
10; Table 3) . This result suggests that, unlike in E. 

15 coli, 100% complementarity reduces, rather than enhances 
translation efficiency. Indeed, none of the highly 
expressed plastid genes have a perfect DB sequence 
(Figure 2A) . RNA gel blots shown in Figure 11 indicate 
that Nt-pHK39 plants with the perfect DB contain -3-fold 

20 less neo mRNA. Therefore, a contributing factor to lower 
NPTII levels in these plants appears to be a faster mRNA 
turnover rate. Furthermore, NPTII expressed from the 
PrrnLT7gl0 derivatives differ by the DB-encoded amino 
acids at the N-terminus. Therefore, differential protein 

25 turnover rates may be part of the reason for differences 
in NPTII accumulation. The highest yield of NPTII (23%) 
was obtained with the synthetic, Nhel -containing DB 
cassette. 



30 DISCUSSION 

This example utilizing the rJbcL translation control 
regions reveals that sequences downstream of the 
translation initiation codon may dramatically affect 
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mRNA translation. Therefore, silent mutations in the DB 
region of heterologous proteins may significantly 
improve expression in chloroplasts by increasing 
complementarity of the mRNA with the plastid rRNA 
5 penultimate stem structure. However, it appears that 
perfect complementarity is undesirable, as it may 
accelerate mRNA turnover and reduce the rate of 
translation. This finding highlights differences in the 
translation machinery of plastids and E. coli, in which 

10 perfect complementarity enhances translation (Etchegaray 
and Inouye, 1999; Sprengart et al. f 1996), It is 
possible, however, that shifting the region of 
complementarity relative to AUG or targeting a slightly 
different region of the penultimate stem may facilitate 

15 highly efficient translation of mRNAs with a perfectly 
matched DB. 

The T7gl0 constructs have one or two relatively 
rare AGC serine codons (4.7 per 1000, Figure 12), one of 
which is encoded in the Nhel site. This codon is 
20 present in the Nt-pHK38 and Nt-pHK40 plants, which 
contain the highest levels of NPTII. Further 
improvement may be expected by replacing the AGC with an 
AGT serine codon. 



25 EXAMPLE 3 

The clpP, psbB and psbA TCRs have distinct expression 

characteristics 

NPTII accumulation was studied in transplastomic 
30 tobacco carrying the PrrnLclpP promoter derivatives. The 
PrrnLclpP+DBwt (Nt-pHK32-2F) and PrrnLclpP-DB (Nt-pHK33- 
2A) plants accumulate 1.2% and 0.2% NPTII in their 
leaves (Figure 10; Table 3) . We have found that over- 
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expression of clpP 5 ' -UTR causes a mutant phenotype 
manifested as pale green leaf color and slower growth. 
This phenotype is normalized in older plants . ■ We assume 
that the primary cause of this mutant phenotype is the 
5 lack of ClpP protein, the clpP gene product. This mutant 
phenotype is absent in plants transformed with other 
5'UTRs. Therefore we believe, that the mutant phenotype 
is attributable to competition for a clpP-specif ic 
nuclear factor. The clpP gene has two introns. 

10 Preliminary RNA gel blot analysis reveals reduced levels 
of mature, monocistronic clpP mRNA (-3 0% of wild- type) 
and accumulation of intron I-containing clpP pre-mRNA in 
the pale-green leaves. Normalization of phenotype 
coincides with increase of translatable monocistronic 

15 clpP mRNA to wild type levels. Over-expression of clpP 

5 'UTR therefore may interfere with splicing of clpP pre- 
mRNA. 

NPTII accumulation was also studied in 
transplastomic tobacco carrying the PrrnLpsbB promoter 

20 derivatives. The PrrnL psbB+DBwt (Nt-pHK36-lC) and PrrnL 
psbB -DB (Nt-pHK37-2D) plants accumulate 2.2% and 2.4% 
NPTII in their leaves (Figure 10; Table 3). Thus, the 
synthetic DB sequence in case of the psbB TCR 
efficiently replaces the native DB sequence. 

25 Conversely, it may rely on an alternative mechanism for 
translation initiation. 

The Prrn promoter constructs with the pshA leader 
were obtained as described. However, we have been able 
to introduce only one of them, PrrnLpsbA-DB ( +GC) into 

30 tobacco: plastids in line Nt-pHK43-lC. The Nt-pHK43-lC 

plants accumulate NPTII at a relatively low level (0.6%; 
Figure 10, Table 3). It is conceivable that the lack of 
success in introducing the +DB construct is due to the 
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dramatically elevated expression level of NPTII which 
is toxic to the plants. 

5 

DISCUSSION 

NPTII levels obtained from PrrnLclpP+DBwt (Nt- 
pHK32-2F) promoter are relatively low, only 1.2% of the 
total soluble protein. However, this promoter is 

10 desirable for driving expression of selectable marker 
genes, as the recovery of transplastomic clones is 
relatively efficient when the neo gene is expressed from 
this promoter, as shown in Example 4. Expression of neo 
from the PrrnLclpP+DBwt promoter does not cause a mutant 

15 phenotype in tissue culture. Thus, it is suitable to 
drive the expression of marker genes, so long as the 
marker gene is subsequently removed. It appears that 
competition for a nuclear- encoded factor required for 
processing the clpP introns gives rise to the reduced 

20 expression observed. This .intron is absent in the clpP 
genes in the monocots rice (Hiratsuka et al., 1989) and 
maize (Maier et al., 1995). The PrrnLclpP+DBwt promoter 
therefore may be used to advantage in the transformation 
of monocots. Furthermore, the level of the trans-factor 

25 required for clpP intron processing is likely to be 

expressed at different levels in dicots. We anticipate 
therefore, that expression of the clpP TCR will have no 
undesirable consequences in other dicot species. It is 
also possible that the phenotypic consequences of 

30 expressing the clpP TCR in pi as t ids is a property of the 
tobacco line, N. tabacum cv. Petit Havana utilized 
herein and is absent in other tobacco lines. This would 
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make the clpP gene TCR a desirable expression tool in 
both monocots and dicots. 

Both psbB leader derivatives accumulate NPTII at 
comparable levels (2.2% and 2.4%, respectively; Table 
5 3). This 5' regulatory region is a good alternative to 
the most commonly used rbcL leader when protein 
accumulation is required in the -2% range. 

In the past, the psbA promoter and leader construct 
yielded relatively high levels of expression in leaves 

10 (2.5% GUS; Staub and Maliga, 1993). Yet these 

constructs did not contain psbA DB elements. The 
present invention describes the generation of chimeric 
promoters that are suitable to obtain high-level protein 
expression while elucidating the regulatory role played 

15 by DB sequences. Prrn is the strongest known promoter 

in plastids and consequently provides for high levels of 
NPTII translation. These elevated levels of NPTII can 
be toxic to the plant and therefore it is difficult to 
obtain transplastomic lines with the highest prospective 

20 levels of NPTII. An alternative approach involves 

operably linking the psbA leader to a relatively weak 
promoter. This approach may generate cassettes which 
are suitable for obtaining relatively high levels of 
protein accumulation from relatively low levels of mRNA. 

25 

EXAMPLE 4 

NPTII accumulation in roots and seeds 

30 Posttranscriptional regulation is an important 

mechanism of plastid gene expression (Rochaix, 1996; 
Stem et al. f 1997). Therefore, we expected that NPTII 
accumulation may be tissue-specific due to regulation of 
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gene expression at the level of mRNA translation. Thus, 
NPTII accumulation was tested in roots and seeds. 

Testing of NPTII accumulation in roots was carried 
out with a subset of transplastomic lines (Table 4) . 
Roots for protein extraction were collected from plants 
grown in liquid MS salt medium (3% sucrose) in sterile 
cultures incubated on a shaker to facilitate aeration. 
Protein was extracted from the roots with the leaf 
protocol and tested for NPTII accumulation (Figure 13 
A). The highest level of NPTII , 0.75%, is found in the 
roots of plants expressing NPTII from the clpP TCR 
(PrrnLclpP+DBwt construct; pHK32) . The second highest 
value, 0.3%, was found in the roots of plants 
transformed with plasmid pHK38 expressing NPTII from the 
T7gl0 TCR (PrnnLT7glO+DB/Ec promoter) . The level of- 
NPTII was about the same, approximately 0.1 %, in roots 
expressing, the recombinant protein from the atpB and 
rJbcL TCR in pHK30- and pHK34- transformed plants. 

Since plastids in the roots are smaller than in 
leaves, we expected lower levels of NPTII accumulation 
in the roots than in the leaves. This was true for all 
the tested roots, except those of the Nt-pHK32 plants. 
Interestingly, NPTII from the clpP TCR accumulated at 
almost the same level in the roots (0.75%,, Table 4) as 
in the leaves (approximately 1%, Table 3) . This is 
likely attributable to high levels of the neo mRNA in 
the roots (Figure 13B) . Since the clpP leader includes 
the minimal PclpP-53 promoter (Sriraman et al., 1998a; 
NAR 26: 4874) we speculate, that the relatively high 
mRNA levels are due to activation of PclpP-53 in roots. 
High levels of expression make the clpP leader a 
desirable TCR for protein expression in roots. 
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The T7gl0 leader (pHK3S) was the most efficient in 
roots from which the most NPTII accumulated relative to 
the mRNA (Table 4) . Although in the Nt-pHK38 plants, the 
neo mRNA was 7-times less than in the Nt-pHK32 plants, 
5 NPTII levels were almost as high (approximately 0.30% 

compared to 0.75%) as in the plastids with the clpP TCR 
(pHK32) . High level NPTII accumulation from the T7gl0 
TCR in leaves (pHK3 8, pHK40; Table 3) and in roots 
(pHK38; Table 4) indicates the general utility of the 

10 phage T7gl0 translation control region for protein 
expression in plastids. 

Protein accumulation was also studied in seeds 
harvested from the transgenic plants (Figure 14) . 
Protein levels were 0.05% in plants transformed with 

15 pHK32 (clp? TCR), and approximately 0.01% in plants 

transformed with plasmid pHK30 (atpB TCR) . No NPTII was 
detectable in plants in which neo was introduced in the 
rbcL TCR-construct (plasmid pHK34) , indicating 
differential protein accumulation which is dependent on 

20 the choice of the TCR. 

Table 4. 

Levels of NPTII and neo mRNA in tobacco roots 

Strain NPTII (%) neo mRNA (%) NPTII/neo mRNAxlO 3 

25 

Nt-pHK30-lD 0.14+0.05 33.7 4.2 

Nt-pHK32-2F 0.75±0.35 100 7.5 

30 Nt-pHK34-9C 0.12±0.03 23 .5 5.1 

Nt-pHK38-2E 0.31±0.04 13.4 23.1 
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EXAMPLE 5 

High-level NPTII expression facilitates efficient 
recovery of transplastomic lines by selection for 
kanamycin resistance 

10 

The plastid genome of higher plants is a 120-kb to 
160-kb double -stranded DNA which is present in 1,900 to 
50,000 copies per leaf cell (Bendich, 1987). To obtain 
genetically stable transplastomic lines every one of the 

15 plastid genome copies (ptDNA) should be uniformly 

altered in a plant. Since integration of foreign DNA 
always occurs by homologous recombination, plastid 
transformation vectors contain segments of the plastid 
genome to target insertions at specific locations. 

20 Useful, non-selectable genes are cloned next to the 

selectable marker genes, which are then introduced into 
the plastid genome by linkage to the selectable marker 
gene (Maliga, 1993) . Transforming DNA is introduced into 
plastids by the biolistic process (Svab et al., 1990; 

25 Svab and Maliga, 1993) or PEG treatment (Golds et al., 
1993; O'Neil et al . , 1993). Elimination of wild-type 
genome copies occurs during repeated cell divisions on a 
selective medium. The success of transformation depends 
on the success of selective amplification of the few 

30 initially transformed genome copies. Therefore the 
choice of the antibiotic used for the selective 
amplification of transformed genome copies and the 
mechanism by which the plant cells are protected from 
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antibiotic action is a critical parameter to be 
considered for successful generation of homoplasmic 
plants. 

The most commonly used antibiotic for the selection 
5 of transplastomic lines is spectinomycin, an inhibitor 
of protein synthesis on plastid ribosomes . Initially, 
plastid transformation in tobacco was carried out by 
selection for resistance based on mutations in the 
plastid 16S rRNA (Svab et al . , 1990). Selection was 

10 inefficient, yielding about one transplastomic clone per 
50 bombarded samples, probably because the 16S rRNA 
based mutation in recessive. Recovery of transplastomic 
lines was enhanced -100-fold by selection for a dominant 
marker, spect inomycin resistance based on inactivation 

15 by aminoglycoside 3" adenyltransf erase encoded in a 

chimeric aadA gene (Svab and Maliga, 1993) . In addition 
to tobacco, selection for spectinomycin resistance 
(aadA) could be applied to recover transplastomic lines 
in Arabidopsis and potato. The aadA gene in plants 

20 confers resistance to both spectinomycin and 

streptomycin. Selection for streptomycin resistance was 
used for plastid transformation in rice, a species 
resistant to spectinomycin, after bombardment with a 
chimeric aadA gene. See Example 8. 

25 The need for an alternative marker gene for plastid 

manipulation has led to testing kanamycin resistance as 
a selective marker. A chimeric neo (kan) gene, encoding 
neomycin phosphotransferase, was suitable to recover 
transplastomic tobacco lines. However, recovery of 

30 transplastomic lines was relatively inefficient, 

yielding only one transplastomic line in -2 5 bombarded 
leaf samples. Furthermore, for every plastid 
transformation event -25 to 50 kanamycin resistant lines 
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were obtained in which integration of the plastid neo 
construct into the nuclear genome resulted in kanamycin 
resistance (Carrer et al., 1993). We report - here that 
the efficiency of recovering transplastomic clones is 
5 significantly improved when transforming tobacco 
chloroplasts with a new neo gene expressed from a 
promoter with the atpB and clpP translation control 
region. The number of nuclear transformation events is 
reduced using the cassettes of the present invention. 
10 These improvements make the new neo gene a practical 
tool for plastid genome manipulations. 

DISCUSSION 

The chimeric neo genes described in Examples 1-4 

15 were introduced into plastids by selection for the 

linked spectinomycin resistance (aadA) gene as their 
suitability for directly selecting transplastomic lines 
was unknown. The transplastomic lines listed in Table 3 
were then tested for resistance to kanamycin by their 

20 ability to proliferate on a medium containing 50 mg/L 
kanamycin. The RMOP meduim used for testing induces 
formation of green callus and shoot regeneration in the 
absence cf kanamycin. The tissue culture procedures 
utilized for this example are described in references 

25 Carrer et al. f 1993 and Carrer and Maliga, 1995. 

On the selctive kanamycin medium only scanty, white 
callus forms from wild-type leaf section. Formation of 
green callus and shoots from leaf section of plants 
transformed with pHK plasmids in Table 3 indicates that 

30 accumulation of NPTII confers kanamycin resistance. We 
set out to test if transplastomic clones can be directly 
selected by kanamycin resistance after bombardment with 
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plasmids pHK30 and pHK32. The results are summarized in 
Table 5. 

Bombardment of 25 tobacco leaves with plasmid pHK3 0 
yielded 4 5 kanamycin resistant lines on a medium 
5 containing 50 mg/L kanamycin. Transplastomic neo lines 
are expected to be resistant to much higher levels, 500 
mg/L of kanamycin (Carrer et al . , 1993). In addition, in 
plasmid pHK3 0 the neo gene is physically linked to a 
spectinomycin resistance [aadA) gene. Spectinomycin 

10 resistance is manifested as kanamycin resistance: 

sensitive leaf sections form white callus and no shoots 
whereas resistant leaf sections form green callus and 
shoots on a selective medium (500 mg/L) RMOP medium. 
We assumed therefore, that all transplastomic lines 

15 should be resistant to both 500 mg/L of kanamycin and 
50 0 mg/L spectinomycin (Carrer and Maliga, 1995) . When 
applying this test we found that 22 of the 45 lines meet 
these criteria. Digestion of the plastid DNA with the 
EcoRI restriction enzyme and probing with the plastid 

20 targeting region should detect 3.1-kb fragment in the 
wild-type and a 4.2-kb and 1.2-kb fragment in 
transplastomic lines (Figure 15A) . DNA gel blot analysis 
of seven of the kanamycin-spectinomycin resistant lines 
confirmed integration of both transgenes into the 

2 5 plastid genome (Figure 15B) . Therefore, we assume that 

all 22 kanamycin -spectinomycin lines are transplastomic 
(Table 5) . 

Bombardment of 3 0 tobacco leaves with plasmid pHK32 
yielded 2 8 kanamycin resistant lines on a medium 

3 0 containing 50 mg/L kanamycin. We have identified 11 

double-resistant lines by testing these on a medium 
containing 500 mg/L of kanamycin and 500 mg/L 
spectinomycin. All six tested were transplastomic by DNA 
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that all eleven are transplascomic (Table 5) . 



5 



TABLE 5 

SELECTION OF TRANS PLASTOMIC TOBACCO 
10 CLONES BY KANAMYCIN RESISTANCE 



vector No. Kan. Res. Kan. Res. Kan. Res. Transplastomic 

leaves 50 mg/L 500 mg/L 500 mg/L 
15 Spec . Res . 

50 0 mg/L 

pTNH32 29 59 7 0 

50 a 52 2 

25 a 47 4 1 

20 pHK3 0 25 4 5 22 22 

pHK32 30 2 8 11 11 



( a Carrer et al . , 1993) 

25 

DISCUSSION 

Plastid transformation efficiency should be 
comparable, if we target the same region of the plastid 
genome for insertion, use similar size targeting 

3 0 sequences and the same method of DNA delivery. 

Therefore, lower transformation efficiencies obtained by 
selection for kanamycin resistance with the old chimeric 
neo genes was likely due to the lack of recovery of 
tranplastomic clones by selection . We have found that 

3 5 transformation with neo genes expressed from the 
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PrrnLatpB+DBwt and PrrnLclpP+DBwt promoters is as 
efficient as with the aadA gene. This is a significant 
technical advance; and will facilitate plastid 
transformation in crops, in which the regenerable 
tissues contain non-green plastids. Most important 
targets are the non-green plastids of cereal crops. 
Kanamycin selection is widely used to obtain transgenic 
lines after transformation with chimeric neo genes in 
dicots. However, kanamycin is an undesirable selective 
agent in monocots such as cereal tissue cultures. 
However, NPTII also inactivates paromomycin, which may 
be used to recover nuclear gene transf ormant s at an 
extremely high efficiency in cereals. See for example, 
PCT application W099/05296. 

EXAMPLE 6 

Bacterial bar gene expression in tobacco plastids 
confers resistance to the herbicide phosphinothricin 

Bialaphos, a non-selective herbicide, is a 
tripeptide composed of two L-alanine residues and an 
analog of glutamic acid known as phosphinothricin (PPT) . 
While PPT is an inhibitor of glutamine synthetase in 
both plants and bacteria, the intact tripeptide has 
little or no inhibitory effect in vitro. Bialaphos is 
toxic for bacteria and plants, as intracellular 
peptidases remove the alanine residues and release 
active PPT. Bialaphos is produced by Streptomyces 
hygroscopicus . The bacterium is protected from 
phosphinothricin toxicity by phosphinothricin 
acetyltransf erase (PAT), the bar gene product. This 
enzyme acetylates phosphinothricin or 

demethylphosphinothricin (Thompson et al . , 1987). PPT 
resistant crops have been obtained by expressing the S. 
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hygroscopicus bar gene in the plant nucleus. Herbicide 
resistant lines were obtained by direct selection for 
PPT resistance in culture after Agrobacterium * 
tumefaci ens-mediated DNA delivery in tobacco, potato, 
5 Brassica napua and Braaaica oleracea <De Block et al . , 
1987, 1989) . Biolistic DNA delivery of chimeric bar 
genes has been employed to obtain PPT resistant maize 
(Spencer et al., 1990), rice (Cao, et al, 1992) and 
Arabidopsis thaliana (Sawaskaki et al., 1994). 

10 Construction of transplastomic tobacco plants, in which 
PPT resistance is based on the expression of bar from S. 
hygroscopicus in plastids is described in the present 
example. The vectors utilized to express the bar gene 
contain an exemplary chimeric 5 1 regulatory region as 

15 set forth in the previous examples. The following 

material and methods facilitate the practice of this 
aspect of the present invention. 

Construction of plastid bar gene 

20 A Ncol/Xbal bar gene fragment was generated by PCR 

amplification using plasmid of pDM302 (Cao et al . , 1992) 
with the following primers : 

PI , 5 1 -AAACCATGGCACCACAAACAGAGAGCCCAGAACGACGCCC - 3 1 ; 

P2 , 5 1 -AAAATCTAGATCATCAGATCTCGGTGACG- 3 1 . 

25 

The ends of the PCR fragment were blunt ended by 
treatment with the Klenow Fragment of DNA polymerase I . 
The fragment was then ligated into the EcoRV site of 
pBluescript II KS+ (Stratagene, La Jolla, CA) to create 
30 plasmid pJEK3 . Sequence analysis of pJEK3 plasmid DNA 
revealed that the Xbal site we intended to create 
through PCR amplification of pDM302 is absent. See 
Figure 19. The bar gene has the two translation 
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termination codons followed by vector sequences. The 
last 20 bp of pJEK3 are: 

CCCGTCACCGAGATCTGATGAtcgaattcctgcagcccgggggatccactagttct 
aga. The bar sequences are in capital (stop codons 
5 underlined) , the vector sequences are in lower case 
(Xbal site underlined) . Since there is an Xbal site 
present in the vector 40 bp from the intended Xbal site, 
it was not necessary to repair this error. The Ncol-Xbal 
fragment from plasmid pJEK3 was ligated into Ncol-Xbal 
10 digested pGS104 plasmid (Serino and Maliga, 1997) to 

generate plasmid pJEK6. Plasmid pGS104 carries a Prrn- 
TrbcL expression cassette in a pPRVlllB plastid 
transformation vector. A map of the plastid targeting 
region of plasmid pJEK6 is shown in Figure 16A- 



Plastid transformation and plant regeneration 

Tobacco (Nicotiana tabacum cv. Petit Havana) 
plants were grown aseptically on agar-solidif ied medium 

20 containing MS salts (Murashige and Skoog, 1962) and 

sucrose (3 0g/l) . Leaves were placed abaxial side up on 
RMOP media for bombardment. The RMOP medium consists of 
MS salts, N6-benzyladenine (lmg/1) , 1-naphthaleneacetic 
acid (0.1 mg/1) , thymine (lmg/1), inositol (100 mg/l) , 

25 agar (6g/l) , pH 5.8, and sucrose (30g/l) . The DNA was 
introduced into chloroplasts on the surface of 1/xm 
tungsten particles using the DuPont PDSlOOOHe Biolistic 
gun (Maliga 1995) . Spectinomycin resistant clones were 
selected on RMOP medium containing 500 pig/ml 

30 spectinomycin dihydrochloride . Resistant shoots were 

regenerated on the same selective medium and rooted on 
MS agar medium (Svab and Maliga, 1993).. The 
independently transformed lines are designated by the 
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transforming plasmid (pJEK6) and a serial number, for 
example pJEK6-2, pJEK6-5. Plants regenerated from the 
same transformed line are distinguished by letters, for 
example pJEK6-2A, pJEK6-2B. 

5 

Southern Blot Analysis 

Total cellular DNA was isolated from wild- type and 
10 transgenic spectinomycin resistant plants with CTAB 

(Saghai-Maroof et al . , 1984) . The DNA was digested with 
the Sma I and Bglll restriction endonucleases, separated 
on a 0.7% agarose gel and blotted onto a Hybond-N nylon 
membrane (Amersham, Arlington Heights, ID by a pressure 
15 blotter. The membrane was hybridized overnight with an 
Apal/ BamHI fragment labeled with (a- "P )dCTP using a 
dCTP DNA Labeling Beads Kit (Pharmacia Inc, Piscataway, 
NJ) . The membrane was washed 2 times with 0.1X SSPE, 
0.2X SDS at 55 °C for 3 0 minutes. Film was exposed to the 
20 membrane for 3 0 minutes at room temperature. 

PAT Assay 

The PAT assay was performed as described by Spencer 
et. al. (1990). Leaf tissue (100 mg) from wild type 

25 tobacco (wt) , transgenic Nt-pDM307-10 tobacco (a line 

transformed with the nuclear bar gene in plasmid pDM307; 
Cao et al., 1992), and plastid bar gene transf ormants 
was homogenized in 1 volume of extraction buffer (10 mM 
Na 2 HP0 4 , 10 mM NaCl) . The supernatant was collected after 

30 spinning in a microfuge for 10 minutes. Protein (25 mg) 
was added to 1 mg/ml PPT and 14 C- labeled Acetyl CoA. The 
reaction was incubated at 37°C for 30 minutes and the 
entire reaction was spotted onto a TLC plate. Ascending 
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chromatography was performed in a 3:2 mixture of 1- 
propanol and NH 4 0H. Film was exposed to the TLC plate 
overnight at room temperature. 



Herbicide Application 

Wild type and transgenic plants were sprayed with 5 
10 ml of a 2% solution of Liberty (AgrEvo, Wilmington, DE) 
with an aerosol sprayer. 



RESULTS AND DISCUSSION 

15 First the bacterial bar gene was converted into a 

plastid gene by cloning the bar coding region into a 
plastid expression cassette. This cassette consists of 
an engineered plastid rRNA operon promoter (Prrn) and 
TrbcL and the 3 1 UTR of the plastid rbcL gene for 

20 stabilization of the mRNA. The plastid bar gene was then 
cloned into the plastid transformation vector to yield 
plasmid pJEK6, and introduced into plastids on the 
surface of microscopic tungsten particles. The bar gene 
integrated into the plastid genome by two homologous 

25 recombination events via the plastid targeting 

sequences, as shown in Figure 16A. Selection for the 
linked aadA (spectinomycin resistance) gene on 
spectinomyc in- containing medium eventually yielded cells 
which carried a uniformly transformed plastid genome 

3 0 population, which were then regenerated into plants. 

Integration of bar and aadA was verified by DNA gel 
blot analysis. Total cellular DNA of wild-type and 
transplastomic plants was digested with the Smal and 
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Bglll restriction enzymes and probed with the 2.9-kb 
Apal-BamHI plastid targeting fragment of N. tabacum 
(Figure 16B) . The two fragments that were expected for 
the transgenic plants, 3.3 kb and 1.9 kb, were present 
5 in each of the transplastomic samples shown in Figure 

16B. Absence of the 2.9 kb wild type fragment indicated, 
that by the time these plants have been regenerated, the 
wild-type plastid genome copies have been diluted out on 
the selective medium. 

10 To determine if the plastid bar gene has been 

expressed , leaf extracts were assayed for 
phosphinothricin acetyltransf erase (PAT) activity. 
Conversion of PPT into acetyl -PPT indicated PAT activity 
in each of the tested transplastomic lines. Data in 

15 Figure 17 are shown for the transplastomic lines Nt- 

pJEK6-2D, Nt-pJEK6-5A and Nt-pJEK6-13B. Interestingly, 
PAT activity was significantly (>>10-fold) higher when 
bar was expressed in the plastids, as compared to the 
bar gene expressed from the cauliflower mosaic virus 35S 

20 promoter in the nucleus of the Nt-pDM307-10 plant. 

PAT expression confers resistance to PPT in tissue 
culture and in the greenhouse. When wild type leaf 
sections are grown in tissue culture, 10 mg/L PPT 
completely blocks callus proliferation. This same PPT 

25 concentration is suitable for the selection of nuclear 
transf ormants after bombardment with the nuclear bar 
construct in plasmid pDM307. Leaf sections of plants 
expressing bar in plastids show resistance in the 
presence of up to 100 mg/L PPT in the culture medium. We 

30 have tested PPT resistance in the greenhouse, spraying 
wild-type and transplastomic plants with Liberty, a 
commercial formulation of PPT, at the recommended field 
dose of 2%. As shown in Figure 18A, 13 days after the 
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treatment, the wild type plants were dead while the 
transgenic plants thrived. Since then the sprayed plants 
have flowered and set seed. Figure 18B shows maternal 
inheritance of PPT resistance. Lack of plastid pollen 
5 transmission results in a lack of herbicide resistance 
in progeny pollinated with transgenic pollen. The 
bacterial bar gene has a high G + C content (68.3%; 
Genbank Accession No. X17220) , while plastid genes have 
a relatively high A + T content; for example the G + C 

10 content of the highly expressed psbA and rbcL genes is 
42.7% and 43.7%, respectively (Genbank Accession No. 
Z00044). Differences in the G + C content are also 
reflected in the codon usage biases. Interestingly, data 
presented here indicate that expression of bar from S. 

15 hygroscopicus is sufficiently high to confer resistance 
to field levels of the non-selective herbicide PPT. 
Furthermore, the PAT enzyme levels obtained in the 
transplastomic lines are significantly higher than those 
observed in the nuclear transf ormant . Therefore, further 

2 0 improvement of the expression levels may be obtained by 
optimizing the codon usage for plastids as set forth in 
Example 7 . 

Advantages of incorporating bar in the plastid 
25 genome are containment of herbicide resistance due to 
the lack of pollen transmission in most crops. 
Furthermore, the lack of genetic segregation would 
simplify back-crossing for the introduction of herbicide 
resistance into additional breeding lines. 

30 

EXAMPLE 7 

A Synthetic bar gene Improves Containment and 
Enhances Expression in Plastids 
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The bacterial bar gene was introduced into the 
tobacco plastid genome by transformation with .plasmid 
pJEK6, as described above in Example 6. In plasmid pJEKG 
5 bar is expressed in a cassette consisting of the 
Prrn (L) rbcL (S) promoter and TrbcL transcription 
terminator. This plasmid conferred PPT resistance to 
plants grown in the presence of PPT in the tissue 
culture medium, but direct selection for transformed 

10 lines was not possible. Although the PAT levels in 
homoplastomic leaves was high, the amount of PAT 
produced by the few pJEK6 bar copies during the early 
stage of plastid transformation was probably 
insufficient to protect the entire cell. 

15 To improve bar expression in plastids a synthetic 

gene was created. The codon usage was modified to mimic 
that of the average tobacco phot ©synthetic plastid gene. 
Changing the codon usage lead to a lowered GC content 
characteristic of higher plant plastid genes. To assist 

20 with cloning, restriction enzyme recognition sequences 
were removed and added as necessary. Codon usage 
frequency in bacteria reflects relative tRNA abundance: 
frequent use of codons for rare tRNAs may significantly 
reduce translation efficiency. We hoped that 

25 differential codon usage in plastids and bacteria would 
reduce or prevent expression of the synthetic gene in 
bacteria, thereby reducing the danger of horizontal gene 
transfer to microorganisms. We also hoped that improved 
bar expression in our novel promoter cassettes will 

30 allow direct selection of plastid transf ormants on PPT- 
containing medium. 

Materials and Methods for Example 7 
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Codon comparisons of photosynthetic {rbcL, psaA, 
psaB, psaC, psbA, psbB, psbC, psbD, psbE, psbF) plastid 
genes were compiled using GCG (Genetics Computer Group, 
Madison, WI) . DNA mutations were then introduced into 
5 the bacterial bar gene making its codon usage more 
similar to plastid genes, while removing several 
restriction enzyme sites that could interfere with 
cloning. See Figure 28. The synthetic bar gene (s-Jbar) 
was obtained by single-step assembly of the entire s-bar 

10 gene from 28 oligonucleotides (one 44 nt primer, one 30 
nt primer and twenty- six 4 0 nt primers) using PCR 
(Stemmer et al., 1995). The top and bottom strands of 
the primers overlap with each other by 20 nucleotides. 
Ncol and Nhel sites were added at the 5' end and a Xbal 

15 site was added at the 3» end through PCR amplification. 
To obtain the complete s-bar gene, a small aliquot of 
the assembly PCR product was amplified using primers 1A 
and 14B. Unchanged nucleotides are in upper case, 
altered nucleotides are in lower case in the primers 

20 listed below. 



Primer 


1A 


ccATGgctAGCCCAGAAaGAaGaCCGGCCGAtATtaGaCG 


Primer 


IB 


GCATaTCaGCtTCtGTaGCACGtCtaATaTCGGCCGGtCt 


Primer 


2A 


TGCtACaGAaGCtGAtATGCCaGCaGTtTGtACaATCGTt 


Primer 


2B 


CTTGTtTCtATaTAaTGGTTaACGATtGTaCAaACtGCtG 


Primer 


3A 


AACCAtTAtATaGAaACAAGtACaGTaAACTTtaGaACtG 


Primer 


3B 


tTCtTGaGGTTCtTGaGGtTCaGTtCtaAAGTTtACtGTa 


Primer 


4A 


AaCCtCAaGAACCtCAaGAaTGGACtGAtGAtCTaGTCCG 


Primer 


4B 


AaGGATAGCGCTCtCGtAGACGGACtAGaTCaTCaGTCCA 


Primer 


5A 


TCTaCGaGAGCGCTATCCtTGGCTtGTaGCaGAaGTtGAC 


Primer 


5B 


GCGATaCCaGCtACtTCaCCGTCaACtTCtGCtACaAGCC 


Primer 


6A 


GGtGAaGTaGCtGGtATCGCaTAtGCGGGCCCtTGGAAGG 


Primer 


6B 


CCAaTCaTAtGCaTTtCtTGCCTTCCAaGGGCCCGCaTAt 


Primer 


7A 


CAaGaAAtGCaTAtGAtTGGACaGCtGAaTCaACtGTtTA 
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Primer 7B G t TGa TGaCG t GG t GAaACGTAa ACaGT t GA t TCa GC t GT 
Primer 8A CGTtTCaCCaCGtCAtCAaCGtACaGGACTtGGtTCtACt 
Primer 8B TTCAGtAGaTGtGTaTAtAGaGTaGAaCCaAGtCCtGTaC 
Primer 9A CTaTAtACaCAtCTaCTGAAaTCttTGGAGGCACAaGGtT 
Primer 9B aACAGCtACaACaCTCTTaAAaCCtTGTGCCTCCAaaGAt 
Primer 1 OA TtAAGAGtGTtGTaGCTGTtATaGGatTGCCtAAtGAtCC 
PrimerlOB CtTCaTGCATGCGtACaCtTGGaTCaTTaGGCAatCCtAT 
PrimerllA aAGtGTaCGCATGCAtGAaGCtCTaGGATATGCtCCaaGa 
PrimerllB CCtGCaGCCCtCAaCATaCCtCttGGaGCATATCCtAGaG 
Primerl2A GGtATGtTGaGGGCtGCaGGtTTCAAaCAtGGaAACTGGC 
Primerl2B t TGCCAaAAACCt ACaTCATGCCAGTT t CCaTGt TTGAAa 
Primerl3A ATGAt GTaGGTTT t TGGCAaCTt GAt TTCAGt CTaCCaGT 
Primerl3B GtAGaACtGGACGaGGaGGTACtGGtAGaCTGAAaTCaAG 
Primerl4A ACCtCCtCGTCCaGTtCTaCCaGTtACtGAGATCTGATGA 
Primerl4B t c t aga TCATC AGATCTC aGTaAC t G 
The amplified s-bar coding region was then cloned 
into a pBSIIKS+ plasmid (Stratagene, La Jolla, CA) and 
sequenced (Figure 20A) . The s-bar gene was cloned into 
cassettes with the chimeric PrrnLatpB+DBwt , 
PrrnLrbcL+D3wt and PrrnLT7glO+DB/Ec promoters* Table 6 
sets forth the plasmids used in the practice of this 
example . 

Table6. Plasmids with bar genes. 



Plasmid 


Promoter 


bar 


3 1 UTR 


Vector 


pK05 




synthetic 
{s-bar) 




PBSIIKS+ 


pK03 


PrrnLatpB+DBwt 


synthetic 
(s-Jbar) 


TrbcL 


pPRVlllB 


pK08 


PrrnLrbcL+DBwt 


synthetic 
( s -bar) 


TrbcL 


pPRVlllA 


pK017 


PrrnLT7g 1 0 +DB /• 
Ec 


synthetic 
(s-bar) 


TrbcL 


pPRVlllB 


pK012 


PrrnLrbcL+DBwt 


bacterial 
(bar) 


TrbcL 


pPRVlllA 
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To provide a suitable cloning site at 3 1 -end of 
the bacterial bar gene, the Eagl/Bglll fragment of s-bar 
was replaced with the cognate fragment of the bacterial 
5 jbar coding region. Such a bacterial Jbar gene is 

incorporated in plasmid pK012 (Figure 21) . In plasmid 
pK012 the first 22 nucleotides of the bacterial jbar 
coding region are replaced v/ith nucleotides from the s- 
bar. 

10 RESULTS 

The engineered bacterial bar gene in pJEK6 is 
expressed both in E. coli and plants, as shown in the 
previous example. We were interested to test if 
modification of the codon affects expression of the s- 

15 Jbar gene in plastids and in E. coli. In E. coli, s-bar 
expression was determined by measuring PAT activity. 
Extracts were prepared from bacteria carrying plasmids 
pK03 and pX08 expressing s-bar from the PrrnLatpB+DBwt 
and PrrnLrbcL+DBwt promoters, respectively. The 

20 radioactive assay did not detect any activity, although 
extracts from bacteria transformed with plasmids pJEK6 
and pK012 carrying the bacterial Jbar genes gave strong 
signals (Figure 22A) . In plasmid pK012 the first 22 
nucleotides of the bacterial bar coding region are 

25 replaced with nucleotides from the s-bar. Therefore, 

lack of expression from the s-Jbar in E. coli is not due 
to changes within the first 22 nucleotides. 

The s-Jbar was also introduced into plastids by 
transformation with vector pK03 . Extracts were prepared 

30 from pK03- and pJEK6- transformed tobacco plants, which 
carry the s-Jbar and bar genes, respectively. Extracts 
from both types of plants contained significant PAT 
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activity (Figure 22B) . Therefore, the synthetic bar is 
expressed in plastids but not in E. coli. 

Changing the bar gene codon usage abrogated 
expression of the gene in E. coli. This is likely due to 
5 the introduction of the rare AGA and AGG arginine codons 
in the s-Jbar coding region. The triplet frequency per 
thousand nucleotides for AGA and AGG is the lowest in E. 
coli, reflecting low abundance of the tRNA required for 
translation of these codons. The minor arginine 
10 tRNA Arg<AGG/AGA> has been shown to be a limiting factor in 

the bacterial expression of several mammalian genes. The 
coexpression of ArgU (dnaY) gene that encodes for 
tRNA Arg(AGG/AGA) resulted in high level production of the 
target protein (Makrides 1996) . The bacterial bar gene 

15 has 14 arginine codons, none of which are the rare 

AGA/ AGG codons. The s-bar gene has five of them, three 
of which are located within the first 25 codons. 
Therefore, the likely explanation for the lack of s-Jbar 
expression in E. coli is introduction of the rare AGA 

20 and AGG arginine codons in the s-Jbar coding region. 

There are proteins, which are toxic to E* coli but 
their expression is desirable in plastid to which it is 
not toxic. Engineering of these proteins in E. coli 
poses a problem, since the commonly used PEP plastid 

25 promoters are active in E. coli t thus the gene will be 
transcribed and the mRNA translated. Incorporation of 
minor codons in the coding region will prevent 
translation of these proteins in E. coli. Particularly 
useful in this regard is conversion of arginine codons 

30 to AGA/ AGG. If no arginine is present in the N- terminal 
region, an N- terminal fusion may be designed containing 
multiple AGA/ AGG codons to prevent translation of the 
mRNA. 
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Plants under field conditions are associated with 
microbes living in the soil, on the leaves and inside 
the plants. Gene flow from plastids to these 
microorganisms has not been shown. However, it would be 
5 an added safety measure to incorporate codons in plastid 
genes, which are rare in the target microorganisms, but 
are efficiently translated in plastids. Incorporation of 
AGA/AGG codons into the selective marker genes and the 
genes of interest will prevent transfer of genes from 

10 plants to microbes, which lack the capacity to 

efficiently translate the AGA/AGG codons. In case of 
specific plant-microbe associations, based on 
differences in codon usage preferences genes could be 
designed which would be expressed in plastids but not in 

15 microbes. 

Attempts to directly select transplastomic clones 
after bombardment with the s-Jbar constructs so far has 
failed. The s-Jbar coding region in Figure 2 OA contains 
frequent and rare codons in proportions characteristic 

20 of plastid genes. It is possible, that relatively rare 
codons in a specific context at a critical stage will 
prevent recovery of plastid transformation events. 
Examples for tissue-specific translation of mRNAs 
dependent on tRNA availability are known (Zhou et al., 

25 1999) . Therefore, we designed a second synthetic bar 
gene, S2-bar, containing only frequent codons (Figure 
20B) . Plastid transformation with the s2-bar will enable 
direct selection of plastid transformation events by PPT 
resistance. 

30 EXAMPLE 8 

FLUORESCENT ANTIBIOTIC RESISTANCE MARKER FOR FACILE 
IDENTIFICATION OF TRANSPLASTOMIC CLONES IN TOBACCO AND 
RICE 
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Plastid transformation in higher plants is 
accomplished through a gradual process, during which all 
the 300-10,000 plastid genome copies are uniformly 
altered. Antibiotic resistance genes incorporated in the 
5 plastid genome facilitate maintenance of transplastomes 
during this process. Given the high number of plastid 
genome copies in a cell, transformation unavoidably 
yields chimeric tissues, in which the transplastomic 
cells need to be identified and regenerated into plants. 

10 In chimeric tissue, antibiotic resistance is not cell 
autonomous: transplastomic and wild-type sectors both 
are green due to phenotypic masking by the transgenic 
cells. Novel genes encoding FLARE - S , a fluorescent 
antibiotic resistance enzyme conferring resistance to 

15 spectinomycin and streptomycin, which were obtained by 
translationally fusing aminoglycoside 3' 1 - 
adenylyl transferase [AAD] with the Aequorea victoria 
green fluorescent protein (GFP) are provided in the 
present example. FLARE-S facilitates distinction of 

20 transplastomic and wild-type sectors in the chimeric 
tissue, thereby significantly reducing the time and 
effort required to obtain genetically stable 
transplastomic lines. The utility of FLARE-S to select 
for plastid transformation events was shown by tracking 

25 segregation of transplastomic and wild-type plastids in 
tobacco and rice plants after transformation with FLARE- 
S plastid vectors and selection for resistance to 
spectinomycin and streptomycin, respectively. 

Plastid transformation vectors contain a selectable 

30 marker gene and passenger gene(s) flanked by homologous 
plastid targeting sequences (Zoubenko et al., 1994), and 
are introduced into plastids by biolistic DNA delivery 
(Svab et al., 1990; Svab and Maliga, 1993) or PEG 
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treatment (Golds et al., 1993; Koop et al., 1996; 
O'Neill et al., 1993). The selectable marker genes may 
encode resistance to spectinomycin, streptomycin or ■ 
kanamycin. Resistance to the drugs is conferred by the 
expression of chimeric aadA (Svab and Maliga, 1993) and 
neo (kan) (Carrer et al., 1993) genes in plastids. These 
drugs inhibit chlorophyll accumulation and shoot 
formation on plant regeneration media. The 
transplastomic lines are identified by the ability to 
form green shoots on bleached wild- type leaf sections. 
Obtaining a genetically stable transplastomic line 
involves cultivation of the cells on a selective medium, 
during which the cells divide at least 16 to 17 times 
(Moll et al., 1990) . During this time wild type and 
transformed plastids and plastid genome copies gradually 
sort out. The extended period of genome and organellar 
sorting yields chimeric plants consisting of sectors of 
wild-type and transgenic cells (Maliga, 1993) . In the 
chimeric tissue antibiotic resistance conferred by aadA 
or neo is not cell autonomous: transplastomic and wild- 
type sectors are both green due to phenotypic masking by 
the transgenic tissue. Chimerism necessitates a second 
cycle of plant regeneration on a selective medium. In 
the absence of a visual marker this is an inefficient 
process, involving antibiotic selection and 
identification of transplastomic plants by PCR or 
Southern probing. The feasibility of visual 
identification of transformed sectors greatly reduces 
the effort required to obtain homoplastomic clones. 

The Aequorea victoria green fluorescent protein 
(GFP) is a visual marker, allowing direct imaging of the 
fluorescent gene product in living cells without the 
need for prolonged and lethal histochemical staining 
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procedures. Its chromophore forms autocatalytically in 
the presence of oxygen and fluoresces green when 
absorbing blue or UV light (Prasher et al., 1992; 
Chalfie et al., 1994; Heim et al., 1994) (reviewed in 
5 ref. Prasher, 1995; Cubitt et al. f 1995; Misteli and 

Spec tor, 1997) . The gfp gene was modified for expression 
in the plant nucleus by removing a cryptic intron, 
introducing mutations to enhance brightness and to 
improve GFP solubility (Pang et al . , 1996; Reichel et 

10 al., 1996; Rouwendal et al., 1997; Haseloff et al . , 

1997; Davis and Vierstra, 1998) . GFP was used to monitor 
protein targeting to nucleus, cytoplasm and plastids 
from nuclear genes (Sheen et al., - 1995; Chiu et al. f 
1996; Kshler et al . , 1997), and to follow virus movement 

15 in plants (Baulcombe et al., 1995; Epel et al., 1996). 
GFP has also been used to detect transient gene 
expression in plastids (Hibberd et al. f 1998). 

The expression of GFP by directly incorporating the 
gfp gene in the plastid genome is described herein. 

20 Incorporation of a visual marker, the GFP protein, in 
the plastid transformation vectors of the present 
invention facilitates distinction of spontaneous 
antibiotic resistant mutants and plastid transf ormants 
(Svab et al . , 1990). Furthermore, transplastomic sectors 

25 in the chimeric tissue can be visually identified, 

significantly reducing the time and effort required for 
obtaining genetically stable transplastomic lines. The 
utility of the GFP marker described here is further 
enhanced by its fusion with the enzyme aminoglycoside 

30 3 ' ' -adenylyltransf erase [AMD] conferring spectinomycin 
and streptomycin resistance to plants. Using a marker 
gene encoding a bifunctional protein, FLARE -S 
(fluorescent antibiotic resistance enzyme, spectinomycin 
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and streptomycin) , prevents physical separation of the 
two genes and simplifies engineering. Furthermore, 
fluorescent antibiotic resistance genes enables 
extension of plastid transformation to cereal crops, in 
5 which plastid transformation is not associated with a 
readily identifiable tissue culture phenotype. 

The following protocols are provided to 
facilitate the practice of the present example. 

10 Construction of tobacco plastid vectors. The 

aadAlSgfp gene encodes FLARE16-S fusion protein, and 
can be excised as an Nhel-Xbal fragment from plasmid 
pMSKSl, a pBSKSII+ derivative (Genbank Accesssion No. 
Not yet assigned . The fusion protein was obtained by 

15 cloning gfp (from plasmid pCD3-326F) downstream of aadA 
(in plasmid pMSK38) , digesting the resulting plasmid 
with BstXI (at the 3' end of the aadA coding region) and 
Ncol (including the gfp translation initiation codon) 
and linking the two coding regions by a BstXI -Ncol 

20 compatible adapter. The adapter was obtained by 

annealing oligonucleotides 5 ' - GTGGGCAAAGAACTTGTTGAA 
GGAAAATTGGAGCTAGTAGAAGGTCTTAAAGTCGC-3 1 and 5 ■ - 
CATGG C GACTT T AAGAC CTTCTAC TAGCTC CAATTTTC CTTCAA C AAGTT C T TTGC 
CCACTACC-3 1 . The adapter connects AAD and GFP with a 

25 peptide of 16 amino acid residues (ELVEGKLELVEGLKVA) . 

The engineered aadA gene (Chinault et al., 
1986) in plasmid pMSK38 (pBSIIKS-f derivative) has Ncol 
and Nhel sites at the 5' end and BstXI and Xbal sites at 
the 3' end of the gene. The Ncol site includes the 

30 translation initiation codon; the Nhel and BstXI sites 
are in the coding region close to the 5 1 and 3' ends, 
respectively; the Xbal site is downstream of stop codon. 
The mutations were introduced by PCR using 
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oligonucleotides 5 1 - 

GGCCATGGGGGCTAGCGAAGCGGTGATCGCCGAAGTATCG- 3 1 and 5'- 
CGAATTCTAGACATTATTTGCCCACTACCTTGGTGATCTC- 3 1 . 

The gfp gene in plasrnid CD3-32 6F is the 
derivative of plasrnid psmGFP, encoding the soluble 
modified version of GFP (accession number U70495) 
obtained under order number CD3-326 from the Arabidopsis 
Biological Resource Center, Columbus, OH (Davis and 
Vierstra, 1998) . The gfp gene in plasrnid CD3-32 6F is 
expressed in the PpsbA /TpsbA expression cassette. The 
gfp gene in plasrnid CD3-326F was obtained through the 
following steps. The BamHI-SacI fragment from CD3-326 
was cloned into pBSKS+ vector to yield plasrnid CD3-326A. 
The SacI site downstream of the coding region was 
converted into an Xbal site by blunting and linker 
ligation ( 5 1 -GCTCTAGAGC ; plasrnid CD3-326B) . An Ncol site 
was created to- include the translation initiation codon 
and at the same time the internal Ncol site was removed 
by PCR amplification of the coding region N-terminus 
with primers 5'- 

CCGGATCCAAGGAGATATAACACCATGGCTAGTAAAGGAGAAGAACTTTTC - 3 1 
and 5 1 -GTGTTGGCCAAGGAACAGGTAGTTTTCC-3 1 . The PCR- 
amplified fragment was digested with BamHI and MscI 
restriction enzymes, and the resulting fragment was used 
to replace the BamHI -MscI fragment in plasrnid CD3-326B 
to yield plasrnid CD3 - 32 6C. The gfp coding region was 
excised from plasrnid CD3-32SC as an Ncol -Xbal fragment 
and cloned into a psbA cassette to yield plasrnid CD3- 
326D. PpsbA and TpsbA are the psbA gene promoter and 
3'- untranslated region derived from plasmids pJS25 
(Staub and Maliga, 1993) . TpsbA has been truncated by 
inserting a Hindi I I linker downstream of the modified 
BspHI site (Peter Hajdukiewcz, unpublished) . The 
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PpsbA: :gfp: : TpsbA gene was excised as an EcoRI-Hindlll 
fragment and cloned into EcoRI and Hindi I I digested 
pPRVlllA, to yield plasmid CD3-326F. 

The aadA16gfp coding region from plasmid pMSK51 was 
5 introduced into two expression cassettes. In plasmid 

pMSK53 the aadA16gfp coding region is expressed in the 
PrrnLrbcL+DBwt /TpsbA cassette, and encodes the FLARE16 - 
S2 protein (fluorescent antibiotic resistance enzyme, 
spectinomcyin) . PrrnLrbcL+DBwt is described in the 

10 previous examples and derives from plasmid pHK14. The 
construct contains a chimeric promoter composed of the 
rm operon promoter, the rbcL gene leader and downstream 
box sequence. TpsbA is the psbA gene 3' untranslated 
region, and functions to stabilize the chimeric mRNA. In 

15 plasmid pMSK54 the aadA16gfp coding region is expressed 
in the PrrnLatpB+DBwt /TpsbA cassette, and encodes the 
FLARE16-S1 protein. PrrnLatpB+DBwt derives from plasmid 
pHKlO, and is a chimeric promoter composed of the rm 
operon promoter, the atpB leader and downstream box 

20 sequence. See Examples 1-4. 

The chimeric aadA16gfp genes were introduced 
into the tobacco plastid transformation vector pPRVlllB 
(Zoubenko et al. t 1994). The aadA gene was excised from 
plasmid pPRVlllB with EcoRI and Spel restriction 

25 enzymes, and replaced with the EcoRI-Spel fragment from 
plasmids pMSK53 and pMSK54 to generate plasmids pMSK57 
(aadA16gfp-S2) and pMSK56 (aadA16gfp-Sl) . 

Construction of rice plastid vectors. Plasmid 
30 pMSK49 is a rice- specif ic plastid transformation vector 
which carries the aadAllgfp-S3 gene as the selective 
marker in the tmV/rpsl2/7 intergenic region (GenBank 
Accession Number: Not yet assigned) . Plasmid pMSK49 



-79- 



WO 00/07431 PCT7US99/17806 

carries the rice Smal-SnaBI plastid fragment 
(restriction sites at nucleotides 122488 and 125 878 in 
the genome Hiratsuka et al . , 1989) cloned into a 
pBSKSII+ (Stratagene) vector after blunting the SacI and 
5 Kpnl restriction sites. The Xbal site present in the 

rice plastid DNA fragment (position at nucleotide 125032 
in the genome (Hiratsuka et al . , 1989) was removed by 
filling in and religation. Prior to cloning the 
selective marker the progenitor plasmid was digested 

10 with the Bglll restriction enzyme giving rise to a 

deletion of 119 nucleotides between two proximal Bglll 
sites (positions at 124367 and 124491) . The aadAllgfp-S3 
gene was then cloned in the blunted Bglll sites. 

The aadA gene in plasmid pMSK49 was obtained by 

15 modifying the aadA gene in plasmid pMSK38 (above) to 
obtain plasmid pMSK39. The modification involved 
translationally fusing the aadA gene product at its N- 
terminus with an epitope of the human c-Myc protein 
(amino acids 410-419; EQKLISEEDL Kolodziej and Young, 

2 0 1991) . The genetic engineering was performed by ligating 
an adapter obtained by annealing complementary 
oligonucleotides with appropriate overhangs into Ncol- 
Nhel digested pMSK38 plasmid. The oligonucleotides were: 
5 ' - CATGGGGGCTAGCGAACAAAAACTCATTTCTGAAGAAGACTTGc - 3 ' and 

2 5 5 * - CTAGGCAAGTCTTCTTCAGAAATGAGTTTTTGTTCGCTAGCCCC - 3 1 . 

The aadAllgfp gene encoding FLARE11-S was obtained 
by linking AAD and GFP with the 11-mer peptide 
ELAVEGKLEVA. To clone aadA and gfp in the same 
polycloning site, gfp (EcoRI-Hindlll fragment; from 

3 0 plasmid CD3-326F) was cloned downstream of aadA in 

plasmid pMSK39 to obtain plasmid pMSK41. The two genes 
were excised together as an Nhel -Hindi I I fragment, and 
cloned into plasmid pMSK45 to replace a kanamycin- 
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resistance gene yielding plasmid pMSK48. Plasmid pMSK45 
is a derivative of plasmid pMSK35 which carries the 
PrrnLT7glO+DB/Ec promoter. The promoter consists of the 
plastid rRNA operon promoter and the leader sequence of 
the T7 phage gene 10 leader. In plasmid pMSK4 8, aadA is 
expressed from the PrrnLT 7gl 0 +DB/ Ec promoter . The aadA 
and gfp genes were then translationally fused with an 
BstXI-Ncol adapter that links the AAD and GF? with an 
11-mer peptide. The adapter was obtained by annealing 
oligonucleotides 5 1 - 

GTGGGCAAAGAACTTGCAGTTGAAGGAAAATTGGAGGTCGC-3 1 and 5'- 
CATGGCGACCTCCAATTTTCCTTCAACTGCAAGTTCTTTGCCCACTACC-3 ! , 
which was ligated into BstXI/Ncol digested pMSK4 8 
plasmid DNA to yield plasmid pMSK49. Plasmid pMSK49 has 
the rice plastid targeting sequences present in plasmid 
pMSK35. 

Tobacco plastid transformation. Tobacco leaves from 
4 to 6 weeks old plants were bombarded with DMA-coated 
tungsten particles using the Dupont PDSlOOOHe Biolistic 
gun (1100 psi) . Transplastomic clones were identified as 
green shoots regenerating on bleached leaf sections on 
RMOP medium containing 50 0mg/L spectinomycin 
dihydrochloride (Svab abd Maliga, 1993). The 
spectinomycin resistant shoots were illuminated with UV 
light (Model B 100AP f UV Products, Upland, California, 
USA) . Shoots emitting green light were transferred to 
spectinomycin free MS medium (Murashige and Skoog, 1962) 
(3% sucrose) on which fluorescent (transplastomic) and 
non- fluorescent (wild- type) sectors formed. Fluorescent 
sectors were excised, and transferred to selective (500 
mg/L spectinomycin) shoot regeneration (RMOP) medium. 
Regenerated shoots were tested for uniform 
transformation by Southern analysis. 
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Rice plastid transformation. Callus formation from 
mature Oryza sativa cv. Taipei 3 09 seeds was induced on 
a modified CIM medium (Tompson et al . , 1986) , containing 
MS salts and vitamins (2 mg/L glycine, 0.5 mg/L 
5 nicotinic acid, 0.5 mg/L pyridoxine and 0.1 mg/L 

thiamine), 2 mg/L 2,4D, 1 mg/L kinetin and 300 mg/L 
casein enzymatic hydrolysate Type III (Sigma C-1026) and 
sucrose (30g/L) . Embryogenic suspensions from rhe 
proliferating embryogenic calli were obtained on the AA 

10 medium (Muller and Grafe, 1978) . For plastid 

transformation by the biolistic process rice embryogenic 
cells were plated on a filter paper on non-selective 
modified CIM medium (Tompson et al., 1986). The 
bombarded cells were incubated for 4 8 hours, transferred 

15 to selective liquid AA medium (Muller and Grafe, 1978) 
(one to two weeks) , and then to solid modified RRM 
regeneration medium (Zhang and Wu, 1988) containing MS 
salts and vitamins, 100 mg/L myo-inositol , 4 mg/L BAP, 
0.5 mg/L IAA, 0.5 mg/L NAA, 30 g/L sucrose and 40 g/L 

20 maltose and 100 mg/L streptomycin sulfate on which green 
shoots appeared in two to three weeks. The shoots were 
rooted on a selective MS salt medium (Murashige and 
Skoog, 1962) containing 3 0 g/L sucrose and 10 0 mg/L 
streptomycin sulfate. Leaf samples for PCR analysis and 

25 confocal microscopy were taken from plants on selective 
medium. 

PCR amplification of border fragments. Total 
cellular DNA was extracted according to Mettler 
30 (Mettler, 1987) . The PCR analysis was carried out with a 
9:1 mixture of AmpliTaq (Stratagene) and Vent (New 
England Biolabs) DNA polymerases in the Vent buffer 
following the manufacturer's recommendations. The left 
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border fragment was amplified with primers 03 (5 1 - 
ATGGATGAACTATACAAATAAG-3 ' and 04 (5 1 -GCTCCTATAGTGTGACG- 
3 1 ). The right border fragment was amplified with 
primers 05 (5 1 -ACTACCTCTGATAGTTGAGTCG-3 1 ) and 06 (5'- 
5 AGAGGTTAATCGTACTCTGG-3 1 ) . The aadA part of FLARE - S genes 
was amplified with primers 01 (5 f - 
GGCTCCGCAGTGGATGGCGGCCTG-3 ' ) and 02 (5»- 
GGGCTGATACTGGGCCGGCAGG-3 ' ) . Primer positions are shown 
in Fig. 5A. Note that the same primers can be used in 
10 transplastomic tobacco and rice plants expressing FLARE- 
S. 

Detection of FLARE- S by fluorescence. FLARE-S 
expressing sectors in the leaves were visualized by an 
Olympus SZX stereo microscope equipped for GFP detection 
with a CCD camera system. Subcellular localization of 
GFP was verified by laser-scanning, confocal microscopy 
(Sarastro 2 000 Confocal Image System, Molecular 
Dynamics, Sunnyvale, CA) . This system includes an argon 
mixed gas laser with lines at 488 and 568 nm and 
detector channels. The channels are adjusted for 
fluorescein and rhodamine images. GFP fluorescence was 
detected in the FITC channel (488-514 nm) . Chlorophyll 
fluorescence was detected in the TRITC channel (560-580 
nm) . The images produced by GFP and chlorophyll 
fluorescence were viewed on a computer screen attached 
to the microscope and processed using the Adobe 
PhotoShop software . 

30 Immunoblot analysis. Leaves (0.5 g) collected from 

plants in sterile culture were frozen in liquid nitrogen 
and ground to a fine powder in a mortar with a pestle. 
For protein extraction the powder was transferred to a 
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centrifuge tube containing 1 ml buffer [50 mM Hepes/KOH 
(pH 7.5) , 1 mM EDTA, 10 mM potassium acetate, 5 mM 
magnesium acetate, 1 mM dithiothreitol and 2 mM PMSF] 
and mixed by flicking. The insoluble material was 
removed by centrifugation at 4°C for 5 min at 11,600 g. 
Protein concentration in the supernatant was determined 
using the Biorad protein assay reagent kit. Proteins (20 
til per lane) were separated in 12% SDS-PAGE (Laemmli, 
1970) . Proteins separated by SDS-PAGE were transferred 
to a Protran nitrocellulose membrane (Schleicher and 
Schuell) using a semi-dry electroblotting apparatus 
(Bio-Rad) . The membrane was incubated with Living Colors 
Peptide Antibody (Clontech) diluted 1 to 200. FLARE -S 
was visualized using ECL chemil luminescence immunoblot 
detection on X-ray film. FLARE - S on the blots was 
quantified by comparison with a dilution series of 
commercially available purified wild-type GFP 
(Clontech) . 

RESULTS AND DISCUSSION 
Tobacco plastid vectors with FLARE -S as the 
selectable marker. 

Two FLARE- S fusion proteins were tested in E. coli. 
In one, the AAD and GFP were linked by an 11-mer 
(ELAVEGKLEVA) , in the second by a 16-mer 
(ELVEGKLELVEGLKVA) linker. For transformation in 
tobacco, the aadA16gfp coding region (16-mer linker) was 
expressed in two cassettes known to mediate high levels 
of protein accumulation in plastids. Both utilize the 
strongest known plastid promoter driving the expression 
of the ribosomal RNA operon (Prrn) , and the 3 1 -UTR of 
the highly expressed psbA gene (TpsbA) for the 
stabilization of the chimeric mRNAs. The PrrnLatpB+wtDB 
(plasmid pMSK56) and PrrnLrbcL+DBwt (plasmid pMSK57) 
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promoters utilize the atpB or rbcL gene leader sequences 
and the coding region N-termini with the downstream box 
(DB) sequence, respectively. Due to inclusion, of the DB 
sequence in the chimeric genes, the proteins encoded by 
the two genes are slightly different, having 14 amino 
acids of the ATP-ase P subunit (atpB gene products) or 
ribulose 1, 5-bisphosphate carboxylase/oxygenase {rbcL 
gene product) translationally fused with FLARE16-S 
(FLARE16-S1 and FLARE16-S2, respectively). To obtain a 
plastid transformation vector with the fluorescent 
spect inomycin resistance genes, the chimeric genes were 
cloned into the trnV/rpsI-2/7 plastid intergenic region 
in plastid vector pPRVlllB. Flasmids pMSK56 and pMSK57 
(Fig. 23) express FLARE16-S1 and FLARE16-S2, 
respectively, as markers. 

Identification of transplastomic tobacco clones by 
fluorescence. Transformation was carried out by 
biolistic delivery of pMSK56 and pMSK57 plasmid DNA into 
chloroplast. The bombarded leaves were transferred onto 
selective (500 mg/L spect inomycin) shoot regeneration 
medium. Wild-type leaves on this medium bleach and form 
white callus. Cells with transformed plastids regenerate 
green shoots. The leaves on the selective medium were 
regularly inspected with a hand-held long-wave UV lamp 
for FLARE- S fluorescence. 

No fluorescence could be detected in young shoots 
(3 to 5 mm in size) developing on pMSK5 6 -bombarded 
leaves. However, formation of bright sectors in the 
leaves was observed, when these small shoots were 
transferred onto non- selective plant maintenance medium. 
In contrast, cultures bombarded with plasmid pMSK57 
yielded small fluorescent shoots at an early stage. 
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These fluorescent shoots, and some of the non- 
fluorescent ones, developed into plants with bright 
sectors on non-selective plant maintenance medium. 
Therefore, FLARE16-S2 is useful for early detection of 
5 plastid transformation events. FLARE16 -S2 fluorescence 
in young shoots on a selective medium should be due to 
relatively high levels of FLARE16-S2. Higher levels of 
FLARE16-S2 are also indicated by the brighter sectors in 
variegated leaves expressing FLARE16-S2 as compared to 

10 FLARE16-S1. 

The size of sectors was different in individual 
shoots. FLARE -S expression in different leaf layers was 
also obvious. With the traditional selection for 
spectinomycin resistance, the transplastomic and wild- 

15 type sectors are not visible. Regeneration of plants 

with uniformly transformed plastid genomes was greatly 
facilitated by the fluorescing sectors expressing FLARE - 
S, which could be readily identified in UV light, 
dissected, and transferred for a second cycle of plant 

20 regeneration on spectinomycin- containing (500 mg/L) 
selective medium. 

Given the high levels of FLARE -S accumulation we 
were interested to find out, if FLARE- S is toxic to 
plants. We expected that toxicity should be manifested 

25 as lower transformation efficiencies. Bombardment of 3 0 
tobacco leaves with plasmids pMSK56 and pMSK57 yielded 
71 and 89 spectinomycin resistant clones, respectively. 
Out of these, 61 and 77 lines were verified as 
transplastomic by fluorescence. Plastid transformation 

3 0 in a subset of these was confirmed by confocal laser 
scanning microscopy (7 clones each; see below) and 
Southern analysis (4 clones) . The frequency of plastid 
transformation events with the FLARE - S -expressing genes 
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was slightly higher (~2 instead of -1 per bombardment) 
than reported earlier with a chimeric aadA gene at the 
same insertion site (Svab and Maliga, 1993) . Therefore, 
we assume that accumulation of FLARE - S at high levels is 
5 not detrimental. Lack of toxicity is also supported by 
the apparently normal phenotype of the plants in the 
greenhouse (not shown) . 

Localization of FLARE- S to tobacco plastids by 

10 confocal microscopy. Due to phenotypic masking, 

transplastomic and wild type sectors in a chimeric leaf 
are both green on a selective medium. However, we have 
found that in chimeric leaf sectors in the same cell 
some plastids express FLARE-S while others do not, when 

15 observed by confocal microscopy (Fig.. 24). FLARE -S and 

chlorophyll fluorescence were detected separately in the 
fluorescein and rhodamine channels, respectively. The 
two images were then overlaid confirming that FLARE-S 
fluorescence derives from chloroplasts . 

20 Expression of FLARE-S was also studied in non- 

green plastid types including the chromoplasts in petals 
and the non-green plastids in root cells (Fig. 24b, f) . 
These studies were carried out in plants, which were 
homoplastomic for the transgenomes . Homoplastomic state 

25 was important, since in non-green tissues chlorophyll 

could not be used for confirmation of the organelles as 
plastids. Since FLARE-S expression could be readily 
detected in chloroplasts as well as non-green plastids, 
the plastid rRNA operon promoter is apparently active in 

30 all plastid types. 

FLARE-S accumulation in tobacco leaves. 

Accumulation of FLARE-S in homoplastomic leaves was 
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tested using the commercially available GFP antibody, 
recognizing the GFP portion (239 amino acid residues) of 
FLARE 1 6 - S (520 amino acids) . FLARE 1 6 - S 1 (532 amino 
acids) was -8 %, whereas FLARE16-S2 (532 amino acids) 
5 was -18 % of total soluble leaf protein (Fig. 25) . To 
calculate FLARE16-S concentrations, a GFP dilution 
series was used as a reference, and the values were than 
increased by 2.6 to correct for the larger size of the 
FLARE16-S1 and -S2 proteins. 

10 

Tracking plastid transformation in rice by FLARE -S 
expression. In rice, plant regeneration is from non- 
green embryogenic cells. Encouraged by FLARE - S 
expression in non-green tobacco plastids, we attempted 

15 to transform the non-green plastids of embryogenic rice 
tissue-culture cells. Plastid transformation was carried 
out using a rice-specific vector expressing FLARE11-S3 
and targeting insertion of the aadAllgfp-S3 gene in the 
trnV/rpsl2/7 intergenic region. The location of the 

20 insertion site and the size of plastid targeting 

sequences in the rice vector are similar to the tobacco 
vectors shown in Fig. 23. 

Plastid transformation in rice was carried out 
by bombardment of embryogenic rice suspension culture 

25 cells using gold particles coated with plasmid pMSK49 
DNA. Rice cells, as most cereals, are naturally 
resistant to spectinomycin (Fromm et al., 1987). FLARE - 
S, however, confers resistance to streptomycin as well 
(Svab and Maliga, 1993) , Therefore, selection for 

30 transplastomic lines was carried out on selective 

streptomycin medium (100 mg/L) . Streptomycin at this 
concentration inhibits the growth of embryogenic rice 
cells. After bombardment, the rice cells were first 
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selected in liquid embryogenic AA medium, then on the 
solid plant regeneration medium, on which the surviving 
resistant cells regenerated green shoots (12 in 25 
bombarded plates) . These shoots were rooted, and grown 
into plants. PCR amplification of border fragments in 
DNA isolated from the leaves of these plants confirmed 
integration of aadAllgfp-S3 sequences in the plastid 
genome (Fig. 26) . The left and right border fragments 
can not be amplified if the gene is integrated into the 
nuclear genome, as one of the primers (04 or 06) of the 
pairs is outside the plastid targeting regions. 

FLARE11-S3 expression in the leaves of two of 
the PCR-positive plants was tested by confocal laser- 
scanning microscopy. In rice, as in tobacco, the FLARE -S 
marker confirmed segregation of transplastomic and wild- 
type plastids (Fig. 27) . In rice only a small fraction 
of chloroplasts expressed FLARE- S . Since individual 
cells marked with arrows in Fig. 27 contained a mixed 
population of wild-type and transgenic chloroplasts, 
FLARE - S in these cells could be expressed only from the 
plastid genome. Integration of aadAllgfp-S3 into the 
nuclear genome downstream of plastid-targeting transit 
peptide would result in uniform expression of FLARE- S in 
each of the chloroplasts within the cell - 

The sequences of the selectable marker genes of the 
invention are provided in Figures 28-34. Figure 35 
depicts a table describing the selectable marker genes 
disclosed in the present example. 

Direct visual identification of transplastomic 
sectors requires high level expression of FLARE -S in 
plastids. High GFP expression levels in Arabidopsis were 
toxic, interfering with plant regeneration. Toxicity of 
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wild-type (insoluble) GFP was linked to GFP accumulation 
in the nucleus and cytoplasm, and could be eliminated by 
targeting it to the endoplasmic reticulum (Haseloff et 
al. # 1997) . GFP aggregates were also cytotoxic to E. 
5 coli cells (Crameri et al . , 1996). To enhance 

fluorescence intensity and to avoid cytotoxicity, 
soluble versions of the codon-modif ied GFP were obtained 
(Davis and Vierstra, 1998) . We have utilized the gene 
for a soluble -modified GFP described by Davis and 

10 Vierstra (Davis and Vierstra, 1998) to create variants 
of FLARE-S, a fusion protein, which does not have an 
apparent cytotoxic effect. The frequency of plastid 
transformation, if affected at all, is increased rather 
then decreased. In tobacco, we normally obtain one 

15 transplastomic clone per bombarded leaf sample (Svab and 
Maliga, 1993) , whereas with the FLARE - S genes on average 
we could recover two clones per sample. Plant 
regeneration from highly fluorescent tissue was readily 
obtained, and the regenerated plants have a phenotype 

20 indistinguishable from the wild type. 

Plastid transformation in rice requires expression 
of the selective marker in non-green plastids. The rRNA 
operon has two promoters, one for the eubacterial-type 
(PEP) and one for the phage-rype (NEP) plastid RNA 

25 polymerase. The promoter driving FLARE -S expression is 
recognized only by the eubacterial-type plastid RNA 
polymerase. Previously, it was assumed that the 
eubacterial-type promoter is active only in chloroplasts 
(Maliga, 1998) . Accumulation of FLARE -S in roots and 

30 petals indicates that PEP is also active in non-green 
plastids . 

Plastid transformation is a process that 
unavoidably yields chimeric plants, since cells of 
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higher plants contain a large number (300 to 50000) of 
plastid genome copies (Bendich, 1987) , out of which 
initially only a few are transformed. High level 
expression of FLARE - S in plastids provides the means for 
visual identification of transplastomic sectors, even if 
they are present in a chimeric tissue. GFP and AAD could 
be expressed from two different genes in a plastid 
transformation vector. However, transformation with a 
marker gene encoding a bifunctional protein prevents 
separation of the two genes and simplifies engineering. 
The fluorescent selective marker will significantly 
reduce the work required to obtain genetically stable 
plastid transf ormants in tcbacco, a species in which 
plastid transformation is routine. The bottleneck of 
applying plastid transformation in crop improvement is 
the lack of technology. In tobacco, chimeric clones with 
transformed plastids are readily identified by shoot 
regeneration (Svab et al., 1990). In Arabidopsis, clones 
with transformed plastids are identified by greening 
(Sikdar et al., 1998). We have shown here that FLARE- S 
is a suitable marker to select for transplastomes in 
embryogenic rice cells, which lack the visually 
identifiable tissue culture phenotypes exploited in 
tobacco and Arabidopsis. Data presented here are the 
first example for stable integration of foreign DNA into 
the rice plastid genome. These rice plants are 
heteroplastomic . Uniformly transformed rice plants will 
be obtained by further selection on streptomycin medium 
and screening the embryogenic cells for FLARE -S 
expression. Thus, the FLARE - S marker system will enable 
extension of plastid transformation to cereal crops. 

The utility of the new chimeric promoters 
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The o 70 -type plastid ribosomal RNA operon promoter, 
Prrn, is the strongest known plastid promoter expressed 
in all tissue types. The ultimate product of this 
5 promoter in the plastid is RNA not protein. Therefore, a 
series of chimeric promoters were constructed to 
facilitate protein accumulation from Prrn, using 
expression of the neomycin phosphotransferase (NPTII) 
enzyme as the reference protein. 

10 

1) The expression cassettes have distinct tissue- 
specific expression profiles. Some of the expression 
cassettes described here will facilitate relatively high 
levels of protein expression in all tissues, including 

15 leaves, roots and seeds. Other cassettes have different 
expression profiles: for example will facilitate 
moderate levels of protein accumulation in the leaves 
while lead to relatively high levels of protein 
accumulation in the roots. Accumulation of a protein at 

20 levels of 10% to 50% of total soluble protein is 

considered high-level protein expression; low-levels of 
protein expression would be in the range of <0.1% total 
soluble cellular protein. 

25 2) Efficiency of the selectable marker gene 

depends on the rate at which the gene product 
accumulates during the early stage of transformation. 
Since initially present only in a few copies per cell, 
high levels of expression from a few copies will provide 

30 protection from toxic substances early on, facilitating 
efficient recovery of transformed lines. The expression 
cassettes will be useful to drive the expression of the 
genes conferring resistance to the antibiotics 



-92- 



WO 00/07431 



PCT/US99/17806 



streptomycin, spectinomycin and hygromycin, and the 
herbicides phosphinotrycin and glyphosate. In such 
applications addition of amino acids at the N-terminus 
is acceptable, as long as it does not interfere with the 
5 expression of the selectable marker genes. NPTII is such 
an enzyme. In cases like NPTII, an N-terminal fusion and 
thereby the mRNA "Downstream Box" sequences give an 
additional at least two to four- fold increase in protein 
levels. The -DB construct which relied on an Nhel site, 

10 and involved addition of one (N-terminal) amino acid of 
the source gene coding region is convenient, but is not 
necessary. When translat ional fusion is not feasible due 
to inactivation of proteins, seamless in- frame 
constructs may be created by PCR methods outlined in the 

15 application. 

3) A second major area on which application of 
the chimeric promoters is extremely useful is protein 
expression for pharmaceutical , industrial or agronomic 
20 purposes. The examples include, but are not restricted 
to, production of vaccines, healthcare products like 
human hemoglobin, industrial or household enzymes. 
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While certain of the preferred embodiments of the 
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What is claimed is: 

5 1. A recombinant DNA construct for expressing 

at least one heterologous protein in the plastids of 
higher plants, said construct comprising a 5' regulatory- 
region which includes a promoter element, a leader 
sequence and a downstream box element operably linked to 
10 a coding region of said at least one heterologous 
protein, said chimeric regulatory region enhancing 
translational efficiency of an mRNA molecule encoded by 
said DNA construct. 

15 2 . A vector comprising the DNA construct of 

claim 1. 

3. A recombinant DNA construct as claimed in 
claim 1, said 5 1 regulatory region being selected from 

20 the group consisting of PrnnLatpB+DBwt , SEQ ID NO:l, 

PrrnLatpB-DB, SEQ ID NO:2, PrrnLatpB+DBm, SEQ ID NO: 3, 
PrrnLclpP+DBwt, SEQ ID NO: 4, PrrnclpP-DB, SEQ ID NO: 5, 
PrrnLrbcL+DBwt , SEQ ID NO: 6, PrrnLrbcL-DB, SEQ ID NO: 7, 
PrrnLrbcL+DBm, SEQ ID NO: 8, PrrnLpsbB+DBwt , SEQ ID NO: 9, 

2 5 PrrnLpsbB-DB, SEQ ID NO: 10, PrrnLpsbA+DBwt , SEQ ID NO: 
11, PrrnLpsbA-DB, SEQ ID NO: 12, PrrnLpsbA-DB ( +GC) , SEQ 
ID NO: 13. 

30 4. A recombinant DNA construct as claimed in 

claim 1, said 5 1 regulatory region being selected from 
the group consisting of PrrriLT7glO+DB/Ec , SEQ ID NO: 14, 
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PrrnLT7glO+DB/pt, SEQ ID NO:15, PrrnLT7glO-DB, SBQ ID 
NO: 15. 

5. A vector comprising a DNA construct as 
5 claimed in claim 1. 

6. A DNA construct as claimed in claim 1, 
said downstream box element having a sequence selected 
from the group consisting of 

10 5 » TCCAGTCACTAGCCCTGCCTTCGGCA 1 3 and 
5 ■ CCCAGTCATGAATCACAAAGTGGTAA 1 3 . 

7. A DNA construct as claimed in claim 1, 
wherein said heterologous protein is expressed from a 

15 bar gene encoded by S. hydroscopicus said bar gene 
inserted into a plasmid selected from the group 
consisting of pK012, and pJEK3, said pJEK3 having the 
sequence of SEQ ID NO: 18. 

20 8. A DNA construct as claimed in claim 1, 

wherein said heterologous protein is expressed from a 
synthetic bar encoding nucleic acid, said synthetic bar 
nucleic acid having selected from the group consisting 
of SEQ ID NO: 19 and SEQ ID NO:20. 

25 

9. A DNA construct as claimed in claim 1, 
said at least one heterologous protein comprising a 
fusion protein. 

30 10. A DNA construct as claimed in claim 9, 

said fusion protein having a first and second coding 
region operably linked to said 5' regulatory region such 
that production of said fusion protein is regulated by 
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said 5' regulatory region, said first coding region 
encoding a selectable marker gene and said second coding 
region encoding a fluorescent molecule to facilitate 
visualization of transformed plant cells. 

5 

11. A vector comprising the DNA construct of 

claim 10 . 

12. A DNA construct as claimed in claim 9, 
10 said fusion protein consisting of an aadA coding region 

operably linked to a green fluorescent protein coding 
region. 

13. A DNA construct as claimed in claim 10, 
15 said aadA coding region being operably linked to said 

green fluorescent protein coding region via a nucleic 
acid molecule encoding a peptide linker having a 
sequence selected from the group consisting of 
ELVEGKLELVEGLKVA and ELAVEGKLEVA. 

20 

14. A DNA construct as claimed in claim 10, 
said construct having a sequence selected from the group 
of SEQ ID NOS: 21-25 and 27. 

25 15. A plasmid for transforming the plastids of 

higher plants, said plasmid being selected from the 
group consisting of pHK30 (B) , pHK31 (B) , pHK60, 
pHK32 (B) , pHK33 (B) , pHK34 (A) , pHK35 (A) , pHK64 (A) , 
pHK36 (A) , pHK37 (A) , pHK38 (A) , pHK39 (A) , pHK40 (A) , 

30 pHK41(A), pHK42(A), pHK43 (A) , pMSK56, pMSK57, pMSK48, 
pMSK49, pMSK35, pMSK53 and pMSK54 . 
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16. A transgenic plant containing a plasmid 
as claimed in claim 15. 

17. A transgenic plant as claimed in claim 
5 15, said plant being selected from the group consisting 

of monocots and dicots. 

18. A method for producing transplastomic 
monocots, comprising : 

a) obtaining embryogenic cells; 

b) exposing said cells to a heterologous DNA 
molecule under conditions whereby said DNA enters the 
plastids of said cells, said heterologous DNA molecule 
encoding at least one exogenous protein, said at least 
one exogenous protein encoding a selectable marker; 

c) applying a selection agent to said cells to 
facilitate sorting of untransformed plastids from 
transformed plastids, said cells containing transformed 
plastids surviving and dividing in the presence of said 
selection agent; 

d) transferring said surviving cells to 
selective media to promote shoot regeneration and 
growth; and 

e) rooting said shoots, thereby producing 
transplastomic rnonocot plants. 

19. A method as claimed in claim 18, wherein said 
heterologous DNA molecule is introduced into said plant 
cell via a process selected from the group consisting of 
30 biolistic bombardment, Agrobacterium- mediated 

transformation, microinjection and electroporation . 
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20. A method as claimed in claim 18, wherein 
protoplasts are obtained from said embryogenic cells and 
said heterologous DNA molecule is delivered to said 
protoplasts by exposure to polyethylene glycol . 

5 

21. A method as claimed in claim 18, wherein said 
selection agent is selected from the group consisting of 
streptomycin, and paromomycin 

10 

22 . A monocot transformed via the method of claim 

18. 

23. A transformed monocot plant as claimed in 
15 claim 22, said monocot plant being selected from the 

group consisting of maize, millet, sorghum, sugar cane, 
rice, wheat, barley, oat, rye, and turf grass. 

24. A method for producing transplastomic rice 
plants, said method comprising: 

a) obtaining embryogenic calli; 

b) inducing proliferation of calli on 
modified CIM medium; 

c) obtaining embryogenic cell 
suspensions of said proliferating calli in liquid AA 
medium; 

d) bombarding said embryogenic cells 
with microprojectiles coated with plasmid DNA; 

e) tranferring said bombarded cells to 
selective liquid AA medium; 

f) transferring said cells surviving in 
AA medium to selective RRM regeneration medium for a 
time period sufficient for green shoots to appear; and 
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g) rooting said shoots in a selective MS 

salt medium. 

25. A method as claimed in claim 24, said plasmid 
DNA being selected from the group of plasmids consisting 
of pMSK35 and pMSK53, pMSK54 and pMSK49. 

26. A transplastomic rice plant produced by the 
method of claim 24. 

27. A method for containing transgenes in 
transformed plants, comprising: 

a) determining the codon usage in said plant 
to be transformed and in microbes found in association 
with said plant; and 

b) genetically engineering said transgene 
sequence via the introduction of rare codons to abrogate 
expression of said transgene in said plant associated 
microbe . 

28. A method as claimed in claim 27, wherein said 
transgene is a bar gene and said rare codons are 
arginine encoding codons selected from the group 
consisting of AGA and AGG. 
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Decoding Region 
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1 10 20 26 

pt ABB 3 ' - AGGUC AGUGAUCGGGACGGAA GCCGU- 5 
1430 1416 



1 10 20 26 

EC ADB 3 ' -GGGUC AGUACUUAGUGUUUC ACCAUU-5 
1483 1469 
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atpB wild type 



3 • - AGGUC AGUGAUCGGGACGGAA GCCGU- 5 1 (16) 

♦ I I I I **l I I ♦! HI • 

3 * - AGGUC AGUGAUCGGGACGGAA GCCGU- 5 ' ( 1 5 ) J 

| Til I #11 T*T*II* T 

AUGAGAAUCAAUCCUACUACUUCUGGUUCUGGGGUUUCCACGCUUGAAAA 

I I • ll*l**lll I* II 

3 ' -AGGUC AGUGAUCGGGACGGAA GCCGU- 5 ' (16) 
• • Hi • |1 |# V 

3 ' -AGGUC AGUGAUCGGGACGGAA GCCGU- 5 ' (13) 

3 ' -AGGUC AGUGAUCGGGACGGAA GCCGU- 5 * ( 9 fl ) 

•! i II •!! • 

. 3 ' -AGGUC AGUGAUCGGGACGGA AGCCGU- 5 ' { 9 / 8 ) 
atpB mutant j |j | \ • ! I I ah 

mRMA W*- AJJSAGAAUaAAcCCgACaACaegUGGaagUGGGGUgUCCACGgcuacc 

II I 1*1 II I II 

3 ' -AGGU CAGUGAUCGGGACGGAA GCCGU' 5 ' (11/9) 

3- AGGTJC AfiT 7G A! irGfiG ACGG AA GCCGU 5' (10/8) 

3 1 -AGGU CAGUGAUCGGGAC^AA GC<^-^ 1 (14) 

3 * - AGGU CAGUGAUCGGGACGGAA GCCGU - 5 ■ ( 14 ) I 
alp* wild type | ||# | I ♦ (•ill | | | 

APGCCUAUUGGUGUUCCAAAAGUCCCUUUCCGAAGUCCUGGAGAGGAAGA 

II II 11*1111 I • 

3 ' - AGGUCAGUGAUCGGGACGGAAGCCGU - 5 ■ (13) 

II I" (••II II • 

3 ' -AGGUC AGUGAUCGGGACGGAA GCCGU- 5 ' ( 13 ) 



3 ' -AGG'JCAGUGAUCGGGACGGAAGCCGU-5 ' (13/26) 

•I n« •TI II I 

3 ■ J1CCUCACU.7A' CCGAAG'" 

rbcL wild type 



3' AGGU CAGUGAUCGGGACGGAA GCCGU-" 5 ' (13/26) I 
I • II I l l I II III ♦ 



AUGUCACCALA?ACAGAGACUAAAGCAAGUGUUGGAUUCAAAGCUGGUGU 

mi mi i • i*n 

3 ' -AGGUC AGUGAUCGGGACGGA AGCCGU- S ' (13/26) 

..| • rn fm || | • 

3 ' -AnmTrACUr,At!CGGGACGGAAGCCGU-5 1 (11/26) 



3 » -AGGU CAGUGAUCGGGACGGAA GCCGU- 5 ^ (10/5) 



3 ' -AGGU CAGUGAUCGGGACGGAA GCCGU- 5 ' ( 9 /7 ) 
rbcL mutant | • I j • I T I I Nhel 

mRNA AUGaguCCuCAgACAGAaACaAAAGCcucaGUaGGAUUCAAAgcuagc 

— III II I I I Ml 

• 3 1 - AGGUC AGUGAUC^GGGACGGA AGCCGU- 5 ' (11/3) 

3 • -AGGU CAGUGAUCGGGACGGAA GCCGU- S ' (9/5) 

3 ' -AGGU CAGUGAUCGGGACGGAA GCCGU- 5 ' (16) 

II II* II I •!!• II •• 

3 * -AGGU CAGUGAUCGGGACGGAA GCCGU- 5 • ( 15 ) I 
psbB wild type e J] |e| ••|||| || | T 

mRNA m G^AUUUCCAUGGGUUUGCCUUGGUAUCGUGUUCAUACCGUUGUAUUGAAUGAUCCCGG 

llll **l 111*11 • III 

3 ' -AGGUC AGUGAUCGGGACGGAA GCCGU- 5 ' (17) 

•••in I !••! I • 1*1 

3 ' -AGGUC AGUGAUCGGGACGGAA GCCGU- 5 ' (15) 

psbA wild type \ 

jnRKA CAUG ACHJGCAAUUUU AGAG AGACG CG AAAGCG AAAGCCUAUG GGGUCG CUU 

••!{•••! II* 

3 " -AC^UCAGVGftUCGGGftOTGftftGCCGU-5 ' (13) 
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T7gl0 mRHA ^CTGGCUAGCAUGACUGGUGGACAGCAAAUGGGUCGCGGAUCCGGCUGCUA 

I I llll I Ill 

Ec ADB 3 -GGGPCAGUACmJAGWranmrAnrArTTT-s - f 15 ) 
T7gl0+DB/Ec mRNA AUOGCaAGCAUGACUGGUGGACAGgcul|c 

il !!• I • I! II •! 

pt ADB 3 ' >AGGUCAGUGAUCROnArry!a>flfYYsn.«; ■ (13) 
T7glO+DB/pt ttRMA MI^CaAucacuagcccugcca:uGgcuigc 

, M , I Ml II I III III II II I I •! 

PL ADB 3 ' - AGGUCAJ5I2G^a£gSS&£Sg^AGCCGU - 5 ' (21) 
T7gl0-»B mRMA ACAUAtt^^agcauugaacaagauggauugcau 

fc , I ! •llll I •!! i •• 

Pt ADB 3 ' -AGGUCAGVGAUCGGGZCGGXAKCrKU S ■ (14) 
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PrrnLatpB+DBvrt (pHKlO) 

SacI 

1 aaactic GCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G AATTAACCGA 

101 TCGACGTGCa AGCGGACATT TATTTTaAAT TCGATAATTT TTGCAAAAAC 

151. ATTTCGACAT ATTTATTTAT TTTATTATTA TGAGAATCAA TCCTACTACT 

Nhel 

201 TCTGGTTCTG GGG TTTCCAC Ggctagc 

P r rnLa tpB - DB ( pHKll ) 

SacI 

1 gage tcQCIC CCCCGCCGTC GT T CAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G AATTAACCGA 

101 TCGACGTGCa AGCGGACATT TATTTTaAAT TCGATAATTT TTGCAAAAAC 

Nhel 

..151 ATTTCGACAT ATTTATTTAT TTTATTATTA TGAGAgctag c 

PrraLatpB+DBm <pHK50) 

SacI 

1 aaactc GCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACG1G AGGGGGCAGG GATGGCTATA TTTCTGGGAG AATTAACCGA 

101 TCGACGTGCa AGCGGACATT TATTTTaAAT TCGATAATTT TTGCAAAAAC 

151 ATTTCGACAT ATTTATTTAT TTTATTATTA TGAGAATaAA cCCgACaACa 

Nhel 

.. 2.01 ... .agTGGaagTG GGGTgTCCAC Ggctagc 

PrrnLclpP+DBwt (pHK12) 

SacI « 
1 naor^ RCTr CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGQ 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGAG TTACGTTTCC 

101 ACCTCAAAGT GAAATATAGT ATTTAGTTCT TTCTTTCATT TAATGCCTAT 

Nhel 

1.51 TGGTGTTCCA AAAGTCCCTT TCCGAAGTCC TGGAGAGGAA gctagc 

PrrnLclpP-DB <pHK13) 

•■SacI • 
1 oaQCtc GCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G TTACGTTTCC 

Nhel 

101 ACCTCAAAGT GAAATATAGT ATTTAGTTCT TTCTTTCATT TAATGCCTgc 
• 151 tagc 
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PrnxtjrbcL+DBwt (pHK14) 

Sad 

1 gagctcGCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTCGGAG TCGAGTAGAC 

101 CTTGTTGTTG TGAaAATTCT TAATTCATGA GTTGTAGGGA GGGATTTATG 

Nhel 

151 TCACCACAAA CAGAGACTAA AGCAAGTGTT GGATTCAAAg ctagc 

PrmLrbcL-DB (pHK15) 

Sac I 

1 gagctc GCTC CCCCGCCQTC GTTCAATGAG AATGGATAAG AGGCT CGTGG 

51 GATTgACQTG AQGSGGCAQG GATGG CTATA TTTC TGGGAG TCGAGTAGAC 

101 CTTGTTGTTG TGAaAATTCT TAATTCATGA GTTGTAGGGA GGGATTTATG 
■ Nhel — — ~ 

151 TCAgctagc 

PrrnLrbcL+DBro (pHR54) 

Sad 

1 gagctc GCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G TCGAGTAGAC 

101 CTTGTTGTTG TGAaAATTCT TAATTCATGA GTTGTAGGGA GGGATTTATG 

Nhel 

151 aguCCuCAgA CAGAaACaAA AGCcucaGTa GGATTCAAAg ctagc 

PrrnlipsbB+DBwt (pHK16) 

Sad # 

1 gagctcGCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AQQQQQQhQQ QhTQQQThTh TTTCTCGSftG CAATGCAATA 

101 AAGTTACGTA GTGTCTATTT ATCTTTGATA TAAGGGGTAT TTCCATGGGT 

™, Nhel 

151 TTGCCTTGGT ATCGTGTTCA TACCGTTGTA TTGAATGATg ctagc 

PrrnlipsbB-DB (pHK17) 

SacI # 
1 gagctc GCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G CAATGCAATA 

Ncol Nhel 

101 AAGTTACGTA GTGTCTATTT ATCTTTGATA TAAGGGGTAT TTccatggct 
151 age 
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PrmLpsbA+DBWt (pHK2X) 

Sad 

1 aaactc GCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGAA AAAAGCCTTC 

101 CATTTTCTAT TTTGATTTGT AGAAAACTAG TGTGCTT GGG AG TCCCTGAT 

Nhel 

151 GATTAAATAA ACCA^GATTT TACCATGACT GCAATTTTAG AGAGAgctag 
201 c 



PrraLpsbA-DB (pHK22) 

Sac I 

1 oflortrfiCTT CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGAA AAAAGCCTTC 

101 CATTTTCTAT TTTGATTTGT AGAAAACTAG TGTGCTT GGG AGT CCCTGAT 

Ncol Nhel 

151 GATTAAATAA ACCAAGATTT TAccatggct age 



PrrnIipsbA-DB(+GC) (pHK23) 

Sac I 

1 gagcccG CTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

# 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G CAAAAAGCCT 

101 TCCATTTTCT ATTTTGATTT GTAGAAAACT AGTGTGCTTG GGAGT CCCTG 

Ncol Nhel 

151 ATGATTAAAT AAACCAAGAT TTTAccatgg ctagc 
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Prr»LT7glO+DB/Ec (pHK18) 

SacI 

1 aaactc GCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G GGAGACCACA 

101 ACGGTTTCCC aCTAGAAATA ATTTTGTTTA ACTTTAAGAA GGAGATATAC 

Nhel 

151 ATATGGCaAG CATGACTGGT GGACAGgcta gc 

PrrnLT7glO+DB/pt (pHK19) 
SacI 

1 aaactc GCTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G GGAGACCACA 

101 ACGGTTTCCC aCTAGAAATA ATTTTGTTTA ACTTTAAGAA GGAGA TATAC 

Nhel 

151 ATATGGCaAt cactagccct gccttGgcta gc 

PrrnliT7glO-DB (pHK20) 

Srcl 

1 aanrrcG CTC CCCCGCCG TC GTTCAATGAG AATGGATAAG AGGCTCGTGG 

51 . GATTGACGTG AGGGGGCAGG GATGGCTATA TTTCTGGGA G GGAGACCACA 

101 ACGGTTTCCC aCTAGAAATA ATTTTGTTTA ACTTTAAGAA GGAGATATAC 
Nhel 

151 ATATGgc tag c 
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pPRV1 1 1 A derivatives 



o 
o 
UJ 




pHK34 (DHK14) 

pHK64 (pHK54) 

pHK36 (pHK!6), ATG in Ncol 

pHK38 (pHK18) 

pHK39 (pHK19) 

pHK41 (pHK21) 

pHK35 (pHK15)" 
pHK37 (pHK17)-' 
pHK40 (pHK20)" 
pHK42 (pHK22)*'" 
pHK43 (pHK23)*'" 



>1.0 kb 



16S rDNA 



u?i<>trnV 

x DL1 



© O O <» 

o AO oj 



O- 
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TO 
CD 
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Left targeting 
region 



Prrn >Trb cL rps12/7 

Right targeting 
region 



1 kb 
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pPRV1 1 1 B derivatives 



O 
o 




2 ATQ 

$&&)| Leader 




U? pHK30 (pHK10) 
' pHK60 (pHK50) 
pHK32 (pHK12) 



pHK31 (pHK11) 
pHK33 (pHK13) 



2.1 kb 
1.0 kb 



16S rDNA 



Left targeting 
region 




Right targeting 
region 



1 kb 
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PCR-1 

primer 



Sacl 



Prrn promoter - - - - - - p p RV100A ( DNA template) 



5' 



Sad 



Prm promoter > Leader 

PCR-2: Construct with wild-type DB (DBwt) 

Product of PCR-1 



Leader and the coding region (DBwt) DNA template 

^ . .primer 

NheTS' 



t 
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NCOI 

JCCATGgcaccacaaacagagAGCCCAGAACGACGCCCGGCCGACATCCGCCGTGCCACCG 

+ + + + + 60 

CGTACcgtggtgtttgtctcTCGGGTCTTGCTGCGGGCCGGCTGTAGGCGGCACGGTGGC 
MAPQTESPERRPADIRRATE 

AGGCGGACATGCCGGCGGTCTGCACCATCGTCAACCACTACATCGAGACAAGCACGGTCA 

+ x + + + + 120 

TCCGCCTGTACGGCCGCCAGACGTGGTAGCAGTTGGTGATGTAGCTCTGTTCGTGCCAGT 
ADM PAVCT I VN H Y I E T S T V N 

ACTTCCGTACCGAGCCGCAGGAACCGCAGGAGTGGACGGACGACCTCGTCCGTCTGCGGG 

+ + - + + + + 180 

TGAAGGCATGGCTCGGCGTCCTTGGCGTCCTCACCTGCCTGCTGGAGCAGGCAGACGCCC 
FRTEPQEPQ EWTDDLVRLRE 

AGCGCTATCCCTGGCTCGTCGCCGAGGTGGACGGCGAGGTCGCCGGCATCGCCTACGCGG 

+ 4 + — + + + 240 

TCGCGATAGGGACCGAGCAGCGGCTCCACCTGCCGCTCCAGCGGCCGTAGCGGATGCGCC 
RYPWLVAEVDGEVAGIAYAG 

GCCCCTGGAAGGCACGCAACGCCTACGACTGGACGGCCGAGTCGACCGTGTACGTCTCCC 

+ + f + + + 300 

CGGGGACCTTCCGTGCGTTGCGGATGCTGACCTGCCGGCTCAGCTGGCACATGCAGAGGG 
PWKARNAYDW TAESTVYVSP 

CCCGCCACCAGCGGACGGGACTGGGCTCCACGCTCTACACCCACCTGCTGAAGTCCCTGG 

~ + ' + + + + + 360 

GGGCGGTGGTCGCCTGCCCTGACCCGAGGTGCGAGATGTGGGTGGACGACTTCAGGGACC 

rhqrtglgstlytkllksle' 

AGGCACAGGGCTTCAAGAGCGTGGTCGCTGTCATCGGGCTGCCCAACGACCCGAGCGTGC 

+. +■ + f +. + 420 

TCCGTGTCCCGAAGTTCTCGCACCAGCGACAGTAGCCCGACGGGTTGCTGGGCTCGCACG 
AQGFKSVVAVIGLPNDPSVR 

GCATGCACGAGGCGCTCGGATATGCCCCCCGCGGCATGCTGCGGGCGGCCGGCTTCAAGC 

-f -+ + + + + 430 

CGTACGTGCTCCGCGAGCCTATACGGGGGGCGCCGTACGACGCCCGCCGGCCGAAGTTCG 
MHEALG Y APRGMLRAAG F K H 

ACGGGAACTGGCATGACGTGGGTTTCTGGCAGCTGGACTTCAGCCTGCCGGTACCGCCCC 

. — __ + + + + + + 540 

TGCCCTTGACCGTACTGCACCCAAAGACCGTCGACCTGAAGTCGGACGGCCATGGCGGGG 
GNWHDV'GFWQLDFSLPVPPR 

Bglll 

GTCCGGTCCTGCCCGTCACCGAGATCTGATGAtcgaattcctgcagcccggggga tccac 
___ + _ + + + + + 

CAGGCCAGGACGGGCAGTGGCTCTAGACTACTagcttaaggacgtcgggccccctaggtg 

Xbal 

tag ttctaga 
+ 610 

atcaagatct Figure 19 
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Ncot Nhel 

CcATG cfCtAGCC CAGAAdGAaGaCCGGCCGA-ATtaGaCGTGCtACaGAaGCtGAtATGC 
+ + + + + + 

ggTACcgaTCGGSTCTTtCTtCtGGCCGGCTaTAatCtGCACGaTGtCTtCGaCTaTACG 
MAS PE RRPAD IRRATEADM P 

CaGCaGTtTGtACaATtGTtAAtCAtTAtATaGAaACAAGtACcGTaAACTTtcGaACtG 
+ + + + + f 

GtCGtCAaACaTGtTAaCAaTTaGTaATaTAtCTtTGTTCaTGgCAtTTGAAagCtTGaC 
AVCT IVNHYIETSTVNFRTE 

AaCCtCAaGAACCtCAaGAaTGGACtGAtGAttTaGTCCGTtTaCGaGAGCGCTATCCtT 
+ + + + + + 

TtGGaGTtCTTGGaGTtCTtACCTGaCTaCTaaAtCAGGCAaAtGCtCTCGCGATAGGaA 
PQEPQEWTDDLVRLRER Y PW 

GGCTtGTaGCaGAaGTtGACGGaGAaGTaGCtGGgATtGCaTAtGCGGGCCCgTGGAAaG 
_ + + + + + _ + 

CCGAaCAtCGtCTtCAaCTGCCtCTtCAtCGaCCcTAaCGtATaCGCCCGGGcACCTTtC 
LVAE V D'GEVA G I A Y A G PW KA 

CAcGaAAtGCaTAtGAtTGGACgGCtGAaTCaACtGTgTACGTtTCaCCaCGtCAtCAaC 
+ + + + ___ + + 

GTgCtTTaCGtATaCTaACCTGcCGaCTtAGtTGaCAcATGCAaAGtGGtGCaGTaGTtG 
RNAY DWTAES TVY VSPRHQR 

GgACaGGACTtGGtTCtACttTaTAtACcCAtCTaCTGAAaTCttTGGAGGCACAgGGtT 
_ + + + + + + 

CcTGtCCTGAaCCaAGaTGaaAtATaTGgGTaGAtGACTTtAGaaACCTCCGTGTcCCaA 
TGLGSTLYTHLLKSLEAQGF 

TtAAGAGtGTgGTaGCTGTtATaGGatTGCCgAAtGAtCCctcgGTaCGCATGCAcGAaG 

_u . +■ + + + + 

AaTTCTCaCAcCAtCGACAaTAtCCtaACGGcTTaCTaGGgagcCAtGCGTACGTgCTtC 
KSVVAVIGLPNDPSVRMHEA 

CtCTcGGATATGCtCCcaGaGGtATGtTGaGGGCcGCaGGtTTCAAaCAtGGaAAtTGGC 
+ + + + — + + 

GaGAgCCTATACGaGGgtCtCCaTACaACtCCCGgCGtCCaAAGTTtGTaCCtTTaACCG 
LGYAP RGM LRAAGFKHGN W H 

ATGAtGTaGGTTTrTGGCAaCTtGAcTTCtcttTaCCaGTACCtCCtCGTCCcGTttTaC 

+ + + — t + * 

TACTaCAtCCAAAaACCGTtGAaCTgAAGagaaAtGGtCATGGaGGaGCAGGgCAaaAtG 
DVGFWQLD FSL PV. PPRPVL P 

BglXI xbal 

CcGTtACt GAGATCT G AT GA tctaga 
+ + 

GgCAaTGaCTCTAGACTACTagatct 
V T E I * * 

Figure 2 OA 
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Wool HheX 

ccAT^CtAGCCCAGAAaGAaGaCCGGCCGAt AT taGaCGTGCt AC aGAaGCt GAt ATGC 

ggTACcgaTCGGGTCTTtCTrCcGGCCGGCTaTAatCtGCACGaTGcCTtCGaCTaTACG 
MAS P E R R P A D I RRATEADMP 

CaGCaGTtTGtACaATtGTtAAtCAtTAtATaGAaACAAGtACaGTaAAtTTtcGaACtG 

+ +— * + — + 

GtCGtCAaACaTGtTAaCAaTTaGTaATaXAtCTtTGTTCaTGtCAtlTaAAagCtrGaC 
A.VCTIVNHYI ETSTVNFR'TE 

AaCCtCAaGAACCtCAaGAaTGGACtGAtGAfcTaGTaCGTtTaCGaGAaCGtTATCCtT 
+ + + + — —- + 

TtGGaGTtCTTGGaGTtCTtACCTGaCTaCTaaAtCAtGGAaAtGCtCTtGCaATAGGaA 
P O E PQEWT DDLVRLRERYPW 

GGCTtGXaGCaGAaGTtGAcGGaGAaGTaGCtGGaATtGCaTAtGCtGGtCCgTGGAAaG 
+ + + . + -r + 

CCGAaCAtCGtCTtCAaCTgCCtCTtCAtCGaCCtXAaCGtATaCGaCCaGGcACCTTtC 
LVAEVDGEVAGIAYAGPWKA 

CAcGaAAtGCaTAtGAtTGGACaGCtGAaTCaACtGTtTAtGTtTCaCCaCGtCAtCAaC 
+ ' — + , + + + 

GTgCtTIaCGtATaCTaACCTCtCGaCTtAGtTGaCAaATaCiUAGtGGtGCaGraGTtG 
RN AYDWTAESTVYVS PRHQR 

G t ACaGGACT cGGtTCtACt tTaT At ACtCAt CT tCTt AAaTCt tT GGAaGCACAa GGt X 

. + -r + + + + 

CaT Gt CCTGAaCCaAGaTGaaAt ATaTGaGTa GAaGAaTTtAGaaACCTtCGTGTt CCaA 
TGLGSTLYTHLLKSL-EAQGF 

T t AAaAGtGTa GTaGCTGTtAT aGGa t T GCCgAAtGA t CCc t caGT aCGCATGCAt GAa G 

— — — •»«--•«-« ; - i — — 4p- — — «^— — — r 

AaTTtTCaCAtOVtCGACAaTAtCCtaACGGcTTaCTaGGgagtCAtGCGTACGTaCTtC 
KSVVAVIG LP ND PSVRM HE A 

CtCTtGGATATGCtCCcaGaGGtATGtTGaGGGCaGCaGGtTTCAAaCAtGGaAAtTGGC 
+ + + + + 

GaGAaCCTATACGaGGgrCcCCaTACaACtCCCGtCGtCCaAAGtTtGtaCCcTTaACCG 
LGYAPRGMLRAAGFKHGtt ttH 

ATGAtGTaGGTTTtTGGCAaCTtGAcTTCtCttTaCCaGTACCtCCtCGTCCcGTttTaC 
4. -f +— + + -r 

TACTaCAtCCAAAaACCGXtGAaCTgAAGagaaAtGGtCATGGaGGaGCAGGgCAaaAtG 
DVGFWQLDFSL PVPPRPVLP 

BfflXt XbaX 

CcGTtACtCa^iATJSPGATGAtfitflfla 
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FLARE1 6-S » seq Length; 
• cC&Xt 



1574 



XGgGGgc 
51 [GAGGTAGTTG 
101 ACATTXGTAC 
XTGATTTGCT 
GCTTTGATCA 
GATTC1CCGC 
CGTGGCGXTA 
AAXGACAXXC 
GGCTATCTTG 
CAGCGGCGGA 
GCGCTAAAXG 
SGATGAGCGA 
TAACCGGCAA 
2GCCTGCCGG 
TCXTGGACAA 
1AXTTGTCCA 



151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 

751 _ 

851 fd^f&AX&a. 

901 XAGATGGXGA 

951 GGTGATGCAA 

1001 &AAACXACCT 

1051 XTCAATGCTT 

1101 AAGAGCGCCA 

1151 GGACGACGGG 

1201 CCCTCGTCAA 

1251 &ACATCCTCG 

1301 ^ATCACGGCA 

1351 SACACAACAT 

1401 AATACXCCAA 

1451 3TCCACACAA 

1501 TGGTCCTTCT 

1551 laasas&SL 



tagcGAAGCG GTGATCGCCG AAGTATCGAC TCAACTATCA 
GCGTCATCGA GCGCCATCTC GAACCGACGT XGCTGGCCGT 
GGCTCCGCAG TGGATGGCGG CCTGAAGCCA CACAGTGAXA 
GGTTACGGTG ACCGTAAG^C XTGAXGAAAC AACGCGGCGA 
ACGACCTTTT GGAAACTTCG GCiTCCCCTG GAGAGAGCGA 
GCTGTAGAAG XCACCATTGT TGTGCACGAC GACATCATTC 
TCCAGCTAAG CGCGftACTGC AAXTTGGAGA ATGGCAGCGC 
TTGCAGGTAT CTTCGAGCCA GCCACGATCG ACATTGATCT 
CTGACAAAAG CAAGAGAACA TAGCGTTGCC TTGGTAGGTC 
GGAACTCTTT GATCCGGTTC CTGAACAGGA TCTATTTGAG 
AAACCTTAAC GCXATGGAAC TCGCCGCCCG ACTGGGCTGG 
AATGTAGTGC TTACGTTGTC CCGCATTTGG TACAGCGCAG 
AAXCGCGCCG AAGGATGTCG CTGCCGACTG GGCAATGGAG 
CCCAGTATCA GCCCGTCATA CTTGAAGCTA GACAGGCTTA 
GAAGAAGATC GCTTGGCCTC GCGCGCAGAT CAGTTGGAAG^ 
CTACGTGAAA GGCGA6ATCA CCAAGGTA CT qGGCAA ^aaT 



ggtcttaaag 



ctxgxTGAAT 



gaaaattgga gctagtagaa 
Tffi^«ri« "TCACXGQAG^ 
tgxtaatggg cacaaatttt ctgtcagtgg agagggtgaa 

CATACGGAAA ACXTACCCTT AAATTXATTT GCACTACTGG 
GTTCCtTGGC CAACACTTGT CACTACTTTC TCTTATGGTG 
TTCAAGATAC CCAGATCATA XGAAGCGGCA CGACTTCTTC 
TGCCXGAGGG ATACGTGCAG GAGAGGACCA TCTCTTXCAA 
AACTACAAGA CACGTGCTGA AGXCAAGTTT GAGGGAGACA 
CAGGATCGAG CTTAAGGGAA TCGATTTCAA GGAGGACGGA 
GCCACAAGTT GGAATACAAC TACAACTCCC ACAACGTATA 
GACAAACAAA AGAATGGAAT CAAAGCXAAC TTCAAAATTA 
.TGAAGATGGA AGCGTXCAAC TACStAGACCA TTATCAACAA 
TXGGCGATGG CCCTGTCCTT XTACCAGACA ACCAXTACCT 
TCTGCCCTTT CGAAAGAXCC CAACGAAAAG AGAGACCACA 
TSASTT7G? ft ACAGCTGCTG GGATXACACA TGGCATGGAT 
AATAAfe gCt C *&oa 
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FIAREl6-Sl-seq Length: 



1953 



51 
101 
151 
201 
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601 
651 
701 
751 
-601 
851 
901 
951 
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1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 

ieoi 

1651 
1701 
1751 
1801 
1851 
1901 
1951 



^tC^CTC CCCCGCCGTC GTTCAATGAG AATGGATAAG 

& SKSS SSSS 

£aaa isssa m sjSgg 
ssss sss ssss ssss ssss 

GACCGTAAGG CTTGATGAAA CAACGCG3CG ^TTJATC AACG^CTTt 
TGGAAACTTC GGCTTCCCCT GGAGAGAGCG ^TCTCCG JJCTOTAGAJ 
CTCACCATTG TTGTGCACGA CGACATCAM CCGTGGCGTT ATCCAJ^J 
GCGCGAACTG CAATTTGGAG AATGGCAGCG CAATGACATT ^gAGGTA 
TCTTCGAGCC AGCCACGATC GACATTGATC T^CTATCTT ^JGACAAAA 
GCAAGAGAAC ATAGCGTTGC CTTGGTAGGT CCAGCGGCGG AGGggCTT 
rGATCCGGTT CCTGAACAGG ATCTATTTGA «5CGC7AAAT GAAACCTTAA 
2GCTATGGAA CTCGCCGCCC GACTGGGCTG GCGATGAGCG AAATGT^TG 
TTACGTTGT CCCGCATTTG GTACAGCGCA GTAACCGGCA AAATCGCGCC 
GCTGCCGACT GGGCAATGGA GCGCCTGCCG ««™^ 
r^CCGTCAT ACTTGAAGCT AGACAGGCTT ATCTTGGACA AGAAGAAGAT 

SSSS? cSSS ^snss£ 

MsrreaeMC ACEAAGCTAG TaGGCAA&a acttgttgaa ggaaa«ttgg 



B S gO B « Eg 

rrfcarACTTG TCACTACTTT CTCTTATGGT GTTCAAI^a „ 

£££££ ACGACTTCTT JAAOKCGCC ATGCCTg^ 

SSSSSS ££2£ SSSS ££3£ SSS£ 

£££££ ISS^ AAACATCCTC G^CAA^ 
TGGAATACAA CTACAACTCC CACAACGTAT ACATCACGGC AGACAAAGAA 
SS tSa^TAA CTTCAAAATT AGACACAACA tTSAAGATGG 
SScttSa C^CACACC ATTATCMCA AAATACTCCA ATTGGCffiTG 

s ss^ = ss SSgL 

Staatcat tt tcttgttcta tcaagagggt gctattgctc ctttctttu < 
^JSSS tStttacta gtatttiact tacatagact tttttgttt*! ^ 
r _ gS5[ SS ^ BSSZiasm httgcattta TTCAT^a?^ 



0^ 



Figure 29 



WO 00/07431 



42/49 



PCT/US99/17806 



FLARE1 6-S2 . 
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seq Length: 1985 

%qci:c^ CTC CCCCGCCGTC GTTCAATGAG AATGGATAAG AGGCTCGTGG 
GAXTGACGTG AGGGGGCAGG GATGGCTAXA TTXCXGGGAG AATXAACCGA 
rCGACGTGCa AGCGGACAXT TATTXTaAAT TCGAXAATTT XXGCAAAAAC 
AXTTCGACAT ATTTAXXTAT 1TTAXTATTA TGAGAATCAA TCCTACXACT 

GACTCAACTA TCAGAGGTAG TTGGCGTiAT CGAGCGCCAT CTCGAACCGA 
CGTXGCTGGC CGTACATTTG TACGGCTCCG CAGTGGAXGG CGGCCTGAAG 
CCACACAGXG ATATTGATTT GCTGCTTACG GTGACCGTAA GGCTTGATGA 
AACAACGCGG CGAGCTTTGA TCAACGACCT TTXGGAAACT TCGGCTTCCC 
CTGGAGAGAG CGAGAXTCTC CGCGCTGTAG AAGTCACCAT XGTTGTGCAC 
GACGAGATGA TTCCGTGGCG XTAXCCAGCT AAGCGCGAAC TGCAATTTGG 
AGAATGGCAG CGCAATGACA TTCTTGCAGG TATCTTCGAG CCAGCCACGA 
TCGACATTGA TCTGGCTATC TTGCTGACAA AAGCAAGAGA ACATAGCGTT 
SCCTXGGXAG GTCCAGCGGC GGAGGAAC7C TTTGATCCGG 7TCCTGAACA 
GGAXCTATTT GAGGCGCTAA ATGAAACCTT AACGCTATGG AACTCGCCGC 
CCGACXGGGC XGGCGATGAG CGAAATGXAG TGCTXACGTT GTCCCGCATT 
TGGTACAGCG CAGTAACCGG CAAAAXCGCG CCGAAGGATG TCGCTGCCGA 
CTGGGCAATG GAGCGCCXGC CGGCCCAGTA TCAGCCCGTC ATACTTGAAG 
CTAGACAGGC TTATCTxGSA CAAGAAGAAG ATCGCTTGGC CTCGCGCGCA 
GATCAGTTG G AAGAATTTGT CCACTACCTG AAAGGCGAGA TCACCAAGGt 
IGycrGGCAAM gaact tgttg a agg aaaatt ggagct agta g aaggtctta 
aagtcgccfc T GgctAGXAiA GGA^GaAc TTTfSSCTGG KGtttGHtCCk 
ATXCTXGTTG AATTAGAXGG XGATGTXAAT GGGCACAAAT TTTCTGTCAO 
rGGAGAGOCT GAAGGTGATG CAACATACGG AAAACTTACC CTTAAATTTA 
rTTGCACTAC TGGAAAACTA CCTGTTCCtT GGCCAACACT TGTCACTAC7 
ITCTCTTATG GXGTXCAAXG CTTTXCAAGA TACCCAGATC ATATGAAGCG 
SCACGACTIC TTCAAGAGCG CCATGCCTGA GGGAXACGTG CAGGAGAGGA 
^CATCTCTTT . CAAGGACGAC GGGAACTAUA' AGSCACGTGC TGAAGICAAG 
TTTGAGGGAG ACACCCTCGT CAACAGGAIC GAGCTTAAGG GAATCGATTX 
CAAG6AGGAC GGAAACATCC TCGGCCAGAA CTTGGAATAC AACTACAACT 
XCACAACGT ATACATCACG GCAGACAAAC AAAAGAAXGG AATCAAAGCT 
&ACTTCAAAA TTAGACACAA CATXGAAGAT GGAAGCGTTC AACTAGCA^ 
XAXTAXCAA CAAAATACTC CAAXXGGCGA TGGCCCTGTC CTTTTACCAG 
^CAACCATTA CCTGTCCACA CAATCTGCCC TTTCGAAAGA TCCCAACGAA 
^AGAGAGACC ACATGGTCCX ^rwfz^rzn^v r?mzriczr<vK rrcraaTTur 

^az^^is ^^agg t_acaaataa^ f^* a ?*gffi Arccr<K£CT 



AGTCTATAGG AGGTTTTGAA AAGAAAGGAG CAATAATCAT XTXCTTGT7C 
TAXCAAGAGG GXGCTATTGC XCCTTTCTTT TTTTCTTTTT ATTTATTTAC 
XAGTATTTTA CTTACAXAGA CTTTXTTGZl T&rrRTTBTaft an^flftpary: 



AGAGGTTATT TTCTTGCATT TATTCAT^a zact* 



Figure 30 
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FLARE11-S • 

1 
51 
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201 
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t-eoi 

851 
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951 
1001 
1051 
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1151 
1201 
1251 
1301 
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1551 



seq Length: 1595 

ccatgggggc tagcgaacaa aaactcattt ctgaagaaga cttgcctagc 



GAAGCGGTGA TCGCCGAAGT ATCGACTCAA CTATCAGAGG TAGTTGGCGT 
CATCGAGCGC CATCTCGAAC CGACGTTGCT GGCCGTACAT TTGTACGGCT 
CCGCAGTGGJ* TGGCGGCCTG AAGCCACACA GTGATATTGA TTTGCTGGTT 
ACGGTGACCG TAAGGCTTGA TGAAACAACG CGGCGAGCTT TGATCAACGA 
CCTTTTGGAA ACTTCGGCTT CCCCTGGAGA GAGCGAGATT CTCCGCGCTG 
TAGAAGTCAC CATTGTTGTG CACGACGACA TCATTCCGTG GCGTTAXCCA 
SCTAAGCGCG AACTGCAATT TGGAGAATGG CAGCGCAATG ACATTCTTGC 
AGGTATCTTC GAGCCAGCCA CGATCGACAT TGATCTGGCT ATCTTGCTGA 
CAAAAGCAAG AGAACATAGC GTTGCCTTGG TAGGTCCAGC GGCGGAGGAA 
CTCTTTGATC CGGTTCCTGA ACAGGATCTA TTTGAGGCGC TAAATGAAAC 
CTTAACGCTA TGGAACTCGC CGCCCGACTG GGCTGGCGAT GAGCGAAATG 
IAGTGCTTAC GTTGTCCCGC ATTTGGTACA GCGCAGTAAC CGGCAAAATC 
GCGCCGAAGG ATGTCGCTGC CGACTGGGCA ATGGAGCGCC TGCCGGCCCA 
GTATCAGCCC GTCATACTTG AAGCTAGACA GGCTTATCTT GGACAAGAAG 
AAGATCGCTT GGCCTCGCGC GCAGATCAGT TG GAAGAATP TGTfiCACTAp 
CTGAAAGGCG AGATCACCAA GGTAf^aGGC AAAbaactta cagttgaagg 
aaaattqqaq qtcgcc& tkq ctAGTAAAGG AGAAGAACTT MCACT&&G 
TTGTCCCAAT TCTTGTTGAA TTAGATGGTG ATGTTAAXGG GCACAAATTT 
rCTGTCAGTG GAGAGGGTGA AGGTGATGCA AGATACGGAA AACTTACCCT 
TAAATTTAIT TGCACTACTG GAAAACTACC TGTTCCtTGG CCAACACTTG 
rCAcTACTTT CTCTTATGGT GTTCAATGCT TTTCAAGATA CCCAGATCAT 
&TGAAGCGGC ACGACTTCTT CAAGAGCGCC ATGCCTGAGG GATACGTGCA 
SGAGAGGACC ATCTCTTTCA AGGACGACGG GAACTACAAG ACACGTGCTG 
ftAGTCAAGTT TGAGGGAGAC ACCCTCGTCA ACAGGATCGA GCTTAAGGGA 
ATCGATTTCA AGGAGGACGG AAACATCCTC GGCCACAAGT TGGAATACAA 
CTACAACTCC CACAACGTAT ACATCACGGC AGACAAACAA AAGAATGGAA 
TCAAAGCTAA CTTCAAAATT AGACACAACA- TTCJAAGATGG AAGCGTTCAA 
CTAGCAGACC ATTATCAACA AAATACTCCA ATTGGCGATG GCCCTGTCCT 
rTTACCAGAC AACCATTACC TGTCCACACA ATCTGCCCTT TCGAAAGATC 
SCAACGAAAA GAGAGACCAC ATGCTCCTTC TTGAGT TTGT AACAGCTGCT, 
SGGATTACAC, J^TGGCATGGA _TGAACTATAC AAATAAfcart ctacra ~ 

■ xur 
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FLAR£11-S3.seq Length: L961 
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caoctcS CTC CCCCGCC6TC 



GAXTGACGTG AGGGGGCAGG 
ACGGTTTCCC aCTAGAAATA 
jffi&TGGCaAG CATGACTGCT. 



GTTCAATGAG 
GATGGCTATA 
ATTTTGITJ^ 



aatggatAag 

TTTCTGGGAG 
ACTTTAAGAA 



AGGCTCGTGG 
GGAGACCACA 
GGAGATATAC 



gaagaagact tgcctag 



CCGTACAXTT 
GATATTGATT 
GCGAGCTTTG 
GCGAGATTCT 
ATTCCGTGGC 
GCGCAATGAC 
ATCTGGCTAT 
GGTCCAGCGG 
TGAGGCGCTA 
CTGGCGATGA 
GCAGTAACCG 
3GAGCGCCTG 
CTTATCTTGG 
SAAGAATTTG 



gttggccJtca 
gtacggctcc 
tgctggttac 
atcaacgacc 
ccgcgctgta 
gttatccagc 
attcttgcag 
cttgctgaca 

CGGAGGAACT 
AATGAAACCT 
GCGAAATGTA 
GCAAAATCGC 
CCGGCCCAGT 
ACAAGAAGAA 
TCCACTACGT 



RBCGSTGEftT 

TCGAGCGCCA 
GCAGTGGATG 
GGTGACCGTA 
TTTTGGAAAC 
GAAGTCACCA 
TAAGCGCGAA 
GTATCTTCGA 
AAAGCAAGAG 
CTXTGATOCG 
TAACGCTATG 
GTGCTTACGT 
GCCGAAGGAT 
ATCAGCCCGT 
GATCGCTTGG 
GAAAGGCG3VG 



gcgaaca aaa 
B5C5RKST3H 

TCTCGAACCG 
GCGGCCTGAA 
AGGCTTGATG 
TTCG3CTTCC 
TTGTTGTGCA 
CTGCAATTTG 
GCCAGCCACG 
AACATAGCGT 
GTTCCTGAAC 
GAACTCGCCG 
TGTCCCGCAT 
GTCGCTGCCG 
CATACTTGAA 
CCTCGCGCGC 
ATCAj 



actcatttct 
"CSAdTCAAi 1 



a 

o 



ACGTTGCTGG 
GCCACACAGT 
AAACAACGCG 
CCTGGAGAGA 
CGACGACATC 
GAGAATGGCA 
ATCGACATTG 
TGCCTTGGTA 
AGGATCTATT 
CCCGACTGGG 
TTGGTACAGC 
ACTGGGCAAT 
GCTAGACAGG 
AGATCAGTTG 



nact^gca^ gtr^aaogaa aattggaggt 



Tin 

GTTAATGGGC 
A7ACGGAAAA 
TTCCtTGGCC 
TCAAGATACC 
GCCTGAGGGA 
ACTACAAGAC 
AGGATCGAC3C 
CCACAAGTTG 
&CAAACAAAA 
GAAGATGGAA 
rGGCGATGGC 
CTGCCCTTTC. 
SAGT TTGTAA. 
MAA&JCtCt 



CACTGGAGTT 
ACAAATTTTC 
CTTACCCTTA 
AACACTTGTC 
CAGATCATAT 
TACGTGCAGG 
ACGTGCTGAA 
TXAAGGGAAT 
GAATACAACT 
GAATGGAAXC 
GCGTTCAACT 
CCTGTCCTTT 
GAAAGATCCC 
CAGCTGCTGG 



GTCCCAATTC 
TGTCAGTGGA 
AATTXATTTG 
ACTACTTTCT 
GAAGCGGCAC 
AGAGGACCAT 
GTCAAGTTTG- 
CGATrTCAAG 
ACAACTCCCA 
AAAGCTRACT 
AGCAGACCAT 
TACCAGACAA 
AACGAAAAGA 
GATTAC&CAT 



SGqcz 
GAATtr 



GAGGGTGAAG 
CACTACTGGA 
CTTATGGTGT- 
GACTTCTTCA 
CTCTTTCAAG 
AGCGAGACAC 
GAGGACGGAA 
CAACGTATAC 
TCAAAATTAG 
TATCAACAAA 
CCATTACCTG 
GAGACCACA7 
GGCATGGATG 



AGTAAAl 
AGATGGTGAT 
GTGA^GCAAC 
AAACTACCTG 
TCAATGCTTT 
AGAGCGCCAT 
GACGACGGGA 
CCTCGTCAAC 
ACATCCTCGG 
ATCACGGCAG 
ACACAACATT 
ATACTCCAAT 
TCCACACAAT 
GGTCCTTCTX 
AACTATACAA 



SESSKSCAM 
ttctxttttx 

rTTGTTTAGA 



CAT(j aaagct t. 



S5f£ATTTTC 
CTTTTTATTT 
TTATAGAAAA 



TSSCCEKETC 
TTGTTCTATC 
ATTTACTAGT 
AGAAGGRGAG 



TATAGGAUaT 
AAGAGGGTGC 
ATTTTACTTA 
GTTATTTTCT 



TTT5S3COSSS 
TATTGCTCCT 
CATAGACTTT 
TGCATTTATT 



or 
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asBKhSBBa vchtcdec&i gTSsefggee ggagsmcrrBegsgreeT 

GCTTCATGCA GGCGAGTTGC AGCCTGCAAT CCGAACTGAG GACGGG7TTT 
TGGAGTTAGC TCACCCTCGC GAGATCGCGA CCCTrrGTCC CGCCCATTGT 
AGCACGTGTG TCGCCCAGGG CATAAGGGGC ATGATGACTT GGCCTCAXCC 
TCTCCTTCCT CCGGCTTAAC ACCGGCGGTC TGTTCAGGGT TCCAAACTCA 
TAGTGGCAAC TAAACACGAG GGTTGCGCTC GTTGCGAGAC TTAACCCAAC 
ACCTTACGGC ACGAGCTGAC GACAGCCATG CACCACCTGT OTCCGCGTTC 
CCGAGGGCAC CCCTCTCTTT CAAGAGGATT CGCGGCATGT CAAGCCCTGG 
TAAGGTTCTT CGCTTTGCAT CGAA7TAAAC CACATGCTCC ACCGCTTGTG 
CGGGCCCCCG TCAATTCCTT TGAGTTTCAT XCTTGCGAAC GTACTCCCCA 
GGCGGGATAC TTAACGCGTT AGCTACAGCA CTGCACGGGT CGAGTCGGAC 
AGCACCTAGT ATCCATCGTT TACGGCTAGG ACTACTGGGG TCTCTAATCC 
CATTTGCTCC CCTAGCTTTC GTCTCTCAGT GTCAGTGTCG GCCCAGCAGA 
GTGCTTTCGC CGTTGGTGTT CTTTCCGATC TCAATGCATT TCACCGCTCC 
ACCGGAAATT CCCTCIGCCC CTACCGTACT CCAGCTTGGT AGTTTCCACC 
GCCTGTCCAG GGTTGAGCCC TGGGATTTGA CGGCGGACTT GAAAAGCCAC 
C7ACAGACGC TTTACGCCCA ATCATTCCGG ATAACGCTTG CATCCTCTGT 
CTTACCGCGG CTGCTGGCAC AGAGTTAGCC GATGCTTATT CCTCAGATAC 
CGTCATTCTT TCTTCTCCGA GAAAAGAAGT TGACGACCCG TGGGCCTTCC 
ACCTCCACGC GGCATTGCTC CGTCAGGCTT TCGCCCATTG CGGAAAATTC 
CCCACTGCTG CCTCCCGTAG GAGXCTGGGC CGTGTCTCAG TCCCAGTGTG - 
GCTGATCATC CTCTCGGACC AGCTACTGAT CATCGCCTTG GTAAGCTATT-; !$L 
GCCTCACCAA CTAGCTAATC AGACGCGAGC CCCTCCTTGG GCGGATTTCT[ 
CCTTTTGCTC CTCAGCCTAC GGGGTATTAG CAACCGTXTC CAGXTGXTGT 
TCCCCTCCCA AGGGCA£GTT CTTACGCGTT ACtCACCCGT TCGCCACTGG 
AAACACCACT TCCCGTTCGA CTTGCATGTG TTAAGCATGC CGCCAGCGTT 
CATCCTGAGC CAGGATCGAA CTCTCCATGA G£TT ( CATAGT TGCAOTACTT 
ATAGCTTCCT TATTCGTAGA CAAAGCGGAT "TCCfeAATTGT CTTTCCTJCC 
AAGGAXAACT TGTATCCATG CGCTTCAGAT TATTAGCCTG GAGTTCGCCA 
CCAGCAGTAT AGCCAACCCT ACCCTATCAC GTCAATCCCA CAAGCCTCTT 
ATCCATTCCC GTTCGATCGT GGCGGGGGGh GTAAGTCAAA ATAGAAAAAA 
CTCACATTGG GTTTAGGGAT AATCAGGCTC GAACTGATGA CTTCCACCAC 
GTCAAGGTGA CACTCTACCG CTGAGTTATA TCCCTTCCSC GTCCCCTCGA 
GAAAGAGAAT. TACCGAATCC TAAGGCAAAG GGGCGAGAAA CTCAAGGCCA 
CCCXTCCTCC GGGCTTTCTT TCCACACTAT TATGGEATAGT CAAATAATGG 
GAAAAATTGG ATTCAATTGT CAACCGGTCC TATCGAAAAT AGGATTGACT 
ATGGATTCGA GCCATAGCAC ATGGTTTCA T AAAATCTGTA CGATTTTCCC 
ttrCTAAAVn ^afiT&^CTTT CCATSAAGAJif' oatcgacaat: atcciataapc 



s 



ttgcatgcgt gcaggtCUAA TCTSSCTCTT CTTTCTTgTT TCAATSA1AT 



TATTATTTCA AAGATAAGAG ATATTCAAAG ATAAGAGATA AGAAGAAGTC 
AAAATTTGAT TTTTTTTTTG GAAAAAAAAA ATCAAAAAGA TATAGTAACA 
TTAGCAAGAA GAGAAACAAG TTCTATTTCA CAATTTAAAC AAATACAAAA 
TCAAAATAGA ATACTCAATC ATGAATAAAT GCAAGAAAAT AACCTCTCCT 
TCTTTTTCTA TAATGTAAAC AAAAAAGTCT ATGTAAGTAA AATACTAGTA 
AATAAATAAA AAGAAAAAAA GAAAGGAGCA ATAGCACCCT CTTGATAGAA 
rAAGAAAATG RTTATTfiCTC PTTTrTTTTC AAAACCTCCT ATACRCT&GG 

ISCAAA 



CCTTTCAtfcT AGTGGACAAA TTCTTCCAAC TGATCTGCGC GCGAGGCCAA 
GCGATCTTCT TCTTGTCCAA GATAAGCCTG TCTAGCTXCA AGTAIGACGG 
GCTGATACTG GGCCGGCAGG CGCTCCATTG CCCAGTCGGC AGCGACATCC 
TTCGGCGCGA TTTTGCCGGT TACTGCGCTG TACCAAATGC GGGACAACGT 
AAGCACTACA TTTCGCTCAT CGCCAGCCCA GTCGGGCGGC GAGTTCCATA 
GCGTTAAGGT TTCATTTAGC GCCTCAAATA GATCCTGTTC AGGAACCGGA 
TCAAAGAGTT CCTCCGCCGC TGGACCTACC AAGGCAACGG TATGTTCTCT 
I JTGCTTTTGTC AGCAAGATAG CCAGATCAAT GTCGATCGTG GCTGGCTCG^ 
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4 AGATACCTGC 
CGCTTAGCTO 
GACTTCTACA 
CCAAAAGGTC 
GTCACCGTAA 
CACTGCGGAG 
GCTCGATGAC 
ACCGCTTCCC 
TTCGCCCGGA 



AAGAATGTCA 
GATAACGCCA 
GCGCGGAGAA 
GTTGATCAAA 
CCAGCAAAXC 
CCGTACAAAT 
GCCAACTACC 
TCATGgATCC 
GTTCGCTCCC 



TTGCGCTGCC 
CGGAATGATG 
TCTCGCTCTC 
GCTCGCCGCG 
AATATCACTG 
GTACGGCCAG 
TCTGATAGTT 
CTCCCTACAA 
AGAAATATAG 



ATTCTCCAAA 
TCGTCGTGCA 
TCCAGGGGAA 
TTGTTTCATC 
T6XGGCTTCA 
CAACGTCGGT 
GAGTCGATAC 
CTGTATCCAa 
CCATCCCTGC 
TgflflggACGG 



ttgggtaccg agctcgaatt cctgcagccc 



AACTGGGGCT 
TTCCACGCCC 
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AGACAGAATT 
CTCTGTAGAA 
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rCAACACAAC 
GAAAGCCAGT 
GATAAGCTCA 
rGGGAAGTTT 
SGAAAAGGTT 
&GGAAGAGGG 
fcAATAAGTCG 
rCGAAAAGGA 
ACCGAGAAAG 
rTGGTAAAAG 
TAGAACATGA 
3TGGAAGAAA 
3AATTGAACS 
3AGGGACAGG 
^AAAACCCAA 



ACATTTCTTT TCAATTTCCA 
TTTTTTGAGA CCTCGAAACA 
ACATACftAGA AAAAGGATAA 
ATTTATGAAT TTCATAGTAA 
TCGAACTTGC TATCCTCTTG 
AGAATGATTC ATTCGGATCG 
AAXCCATGTT CCATATTTGA 
TACAATCCTC TTCCTGCTGA 
AATGGAGGAC TGGJGCCGAC 
CCGGGATCGC TAACTAATAG 
GAAATAGATA TetagctagA 
AATTGAAAAG AACTGTCTTT 
GCGGGTCTTA TGCAAXCGAT 
ATAGGTCATC GAAAGGATCT 
TAGAAAATGG ATTCCTATTT 
CATTAACCCG TCAATTTTGG 
CGGGAAGAAA TTGGAATGGA 
CTCTATTGAT GCAAACGCTG 
.AAAAATCGAA ATGAAATAAAT 
AAGATAGAAG AGCCCAGATX 
TCCTTCTGAT TCTCAAAGAA 
ATTTCTTCTT ATTATAAGAC 
AACAATCTTC TCCTTTAATC 
AAACGTGACT CAATTGGTCT 
GGGCGAAGAC TCTCGAAC6A 
AGGAGCCGTA TTAGGTGAAA 
AAGGGTGACT ^ATCTGTOGA 



CTCTGCCTTA 



TTCAAGAGTT 
TGAAATGGAC 
TGGTAGCCCT 
TAGAAATCCA 
CCTAATAGGC 
ATATGAGGAC 
AGAGGGTTGA 
GCCCCCTTTC 
AGTTCATCAC 
AATAGTACTA 
AAXAGAAACA 
TCTGTATACT 
CGGATCATAT 
CGGACGACTC 
GAAGAGTGCC 
ATCCAATTCG 
ATAATATAGA 

tacctagagg 
taSagaataa 
ccaaatgaag 
tgaggggcaa 

GTGATTTGAT 
ATAAATGGAA 
TAGTTAGTCT 
GGAAAAGGAT 
ATCTCATGTA 
CTTTTCCACT 



TTGCAGTTCG k 

CAACAATGGT 

GCCGAAGTTT 

AAGCCTTACG 

GGCCGCCATC 

TCGAGATGGC 

TTCGGCGAXC 

GCGCTTCgTA 

CCCCTCACCT 

CGGGGGAGCfc 



TCTTMCTCT 
AAATTCCTTC 
CCCATTAACT 
TGTCCTACCG 
AAAGATTGAC 
CCAACTACGT 
CCTCTGTGCT 
TCCTCGCTCC 
GGAAGAAAGA 
CTAACTAATA j 
ACTAATATAT 
TTCCCCGTTC 
AGATATCCCT , 
ACCAAAGCAC* 
TAACCGCATG ^ 
GGATTTTTCT ^JjL 
TTCATACAGA 
ATAGGGATAG ™ 
AGCAAAAAAA ^ 
AAATGGAAAC 
GGGGATTGAT Ctf 
CCGCATATGT 
AGTGTTCAAT 
TCGGGACGGft 
CCCTTCGAAA 
CGATTCTGTA 
ATCAACCCCfl 



Figure 33B 



WO 00/07431 



47/49 



PCT/US99/17806 



pMSK49.seq Length: 5263 



Figure 34A 



i 

51 
101 
151 
201 
251 
301 
351 
401 
451 
501 
551 
601 
651 
701 
751 
*01 
851 
901 
$51 
1001 
1051 
1101 
1151 
1201 
1251 
1301 
1351 
1401 
1451 
1501 
1551 
1601 
1651 
1701 
1751 
1801 
1851 
1901 
1951 
2001 
2051 
2101 
2151 
2201 
2251 
2301 
2351 
2401 
2451 
2501 
2551 
2601 
2651 
2701 



SGGAACGGAT 
GCTTCAT6CA 
TGGAGTTAGC 
AGCACGTGTG 
TCTCCTTOCT 
TAGTGGCAAC 
ACCTTACGGC 
CCGAGGGCAC 
TAAGGTTCTT 
CGGGCCCCCG 
GGCGGGATAC 
AGCACCTAGT 
CATTTGCTCC 
GTGCTTTCGC 
ACCGSAAATT 
GCCTGTCCAG 
CTACAGACGC 
CTTACCGCGG 
CGTCATTGTT 
ACCTCCACGC 
CCCACTGCTG 
GCTGATCAXC 
SCCTCACCAA 
CCTTTTGCTC 
rCCCCTCCCA 
WIACACCACT 
CATCCTGAGC 
ATAGCTTCCT 
&AGGATAACT 
CCAGCAG?AT 
ATCCATTCCC 
CTCACATTGG 
STCAAGGTGA 
3AAAGAGAAT. 



3CCTTCCTCC 
3AAAAAITGG 
fiXGGRTXCGA 
SATCTAAA3LC 



.CAAAAAA 
iGG 
:qTTTCT. 



TCACCGCCGT 
GGCGAGTTGC 
TCACCCTCGC 
TCGCCCAGGG 
CCGGCTTAAC 
TAAACACGAG 
ACGAGCTGAC 
CCCTCTCTTT 
CGCTTTGCAT 
TCAATTCCTT 
TTAACGCGTT 
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CCTAGCTTTC 
CGTTGGTGTT 
CCCTCTGCCC 
GGTTGAGCCC 
TTTACGCCCA 
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TCTTCTCCGA 
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CACTCTACCG 
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GGGCTTTCTT 
ATTCAATTGT 
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GASCAGGTtl 
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&GGGCAGATT 
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ZATCTTCAAT 
rGTTTGTCTG 
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rCCGTATGTT 



GTCTATGTAA 
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GTGTGGACAG 
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CCGTGATGTA 
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CGTCGTCCTT 
GCGCTCTTGA 
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TATGGATAGT 
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gatcgacggt 
TCCWCWTT 

AGTAAATAAA 
AGAACAAGAA 
TAGGCCAGGA 
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TTTCGTTGGG 
TCTGGTAAAA 
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ATAAATTTAA 
ACXGACAGAA 
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GACGGGTTTT 
CGCCCATTGI 
GGCCTCATCC 
TCCAAACTCfl 
TTAACCCAAC 
GTCCGCGTTC 
CAAGCCCTGG 
ACCGCTTGTG 
GTACTCCCCA 
CGAGTCGCAC 
TCTCTAATCC 
GCCCAGCAGA 
TCACCGCTCC 
AGTTTCCACC 
GAAAAGCCAC 
CATCCTCTGT N£ 
CCTCAGATAC 
TGGGCCTTCC 
CGGAAAATTC 
TCCCAGTGTG 
GTAAGCTATT 
GCGGAXTTCT 
CAGTTGTTGT 
TCGCCACTGG 
CGCCAGCGTT 
TGCATTACTT 
CTTTCCTTCC 
GAGTTCGCCA 
CAAGCCTCTT 
ATAGAAAAAA 
CTTCCACCAC 
GTCCCCTCGA 
CTCAAGGCCA 
CAAATAA7GG 
AGGATTGACT 

atcgataagc 

TAAAAAGAAA 
A ATGATTATT 
TCpctctaga 



r 
i 



hp 



GCAGCTGTTA 
ATCTTTCGAA 
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TGTATTCCAA 
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SScSltc CT^qcaacc tccaattttc cpcffggg. 
5E3w4 CCaSffi luESLll 5^11^5 

foT^^TTC CRA CTGATCTGCG CGCGAGGCCA AGCGATCTTC TTCTTGTCCA 
SSSS? SSSc AAGTATGACG GGCTGATACT GGGCCGGCJG 

sSctccatt gcccagtcgg cagcgacatc cttcggcgcg awttgccgg 

SaSgCGCT GTACCAAATG CGGGACAACG TAAGCACTAC JTTTCGCTCA 
TCGCQAGCCC AGTCGGGCGG CGAGTTCCAT AGCGTTAAGG TTTCATTlAG 
CGCCTOttAT AGATCCTGTT CAGGAACCGG ATCAAAGAGT TCCTCCGCCG 

Sggacctac caaggcaacg ctatgttctc ttgcttttgt ^^gata< 
SEagatcaa tgicgatcgt ggctggctcg aagatacctg ^amgtc 

ATTGCGCTGC CATTCTCCAA ATTGCAGTTC ' GCGCTTAGCT GGATA*1CGCC 

SgStSt gtcgtcgtgc acaacaatgg tgacttctac a^ckaga 3 

WCTCGCTCT CTCCAGGGGA AGCCGAAGTT TCCAAAAGGT CGTTGATCM 
RGCTCGCCGC GTTGTTTCAT CAAGCCTTAC GGTCACCGTA ACCAGCAAAT 
CAATATCACT GTGTGGCTTC AGGCCGCCAT CCACTGCGGA GCCGTACAM 
TGTACGGCCA GCAACGTCGG TTCGAGATGG CGCTCGAT GA CQCT^CTg g 
r-TfTTGRTAGT tctbtcgaTA g SSSSSSSI ^CGCTTgl Sp&SSBi 
^gggtSSS Jtalfltttt tqttcqctag cffi sTteftgC MlOiLWtt 
to£SS TATCTCCTTC TTA^W AC AAAATTAX XTCTAGtGGG 

SSS gSSccctc ccagaaatat agccatccct gccccctcac 

rrrrAATCCCA CGAGCCTCTT ATOggaggg ATTGA*^ eSCGfiGGg»^-I 
dgL'Scteiaa ttcctacaqe ccgatcg T^ CtlWL\.uAk 
CTaSaWtCT Tl'TCAAITTC CATTCAAGAG TTTCTTATCT GTTTCCACGC 

Stttttga gacctcgaaa catgaaatgg acaaaxxccx tctcttagga' 

RCACATACAA GAAAAAGGAX AATGGTAGCC CXCCCAXIAA CTACCTCATT 
rCATTTATGA StTCATAGT AAIAGAAATC CAXGXCCIAC CGAGACft^A 

Stcgaacw gctatcctct tgcctaatag gcaaagaxxg acctctgtag 

SSaXGAT XCATTCGGAX CGATATGAGG ACCCAACTAC GWGCATTGC 

agaatccatg .ttccatattt gaagagggtt'G«5c?ctgtg cttctcxcat 

SGTACAATCC TCTTCCTGCT GAGCCCCCTT TCTCCTCGGT ^ttGAGAA 

Saatggagg ACTGGTGCCG acagttcatc acggaagaaa gaactcacag 

fcGCCGGGATC GCTAACTAAT AGAAIAGXAC IACTAACXAA TACTAAXAXA 
rAfiAAATAGA TATctageta gAAAXAGAAA CAACTAATAT ATAGAIAATC 
£aaS£ SaaSgxcx txxcxgtata CTTTCCCCGT TCTATT^Cm 

XGCGGGTCT TATGCAATCG ATCGGATCAT ATAGATAXOC CXTCAACACA 
SSagSS TCGAAAGGAT CTCGGACGAC TCACCAAAGC ACGAAAGCOL 
OTAGAAAAT GGATTCCTAT TIGAAGAGTG CCTAACCGCA TGGMAAGCT 
SSttaacc CGTCAATTTT GGATCCAATT cgggaixxxx cttgggaagt 
rTCGGGAAGA AATTGGAATG GAATAATATA GATTCATACA GAGGAAAAGG 
nCTCTATTG ATGCAAACGC TGTACCTAGA GGATAGGGAT AGAGGAAGAG 
SGAAAAATCG AAAXGAASTA AATAAAGAAT AAAGCAAAAA AAAAATAAGT 
CGAAGATAGA AGAGCCCAGA TTCCAAATGA AGAAATGGAA ACTCGAAAAG 
GATCCTTCTG ATTCTCAAAG AATGAGGGGC AAGGGGAXXG ATJCCGAGAA 
AGATTTCTTC TTATTATAAS ACGTGATTTG ATCCGCAXAX GTCTGGIAAA 
AGAACAAXCI TCTCCTTTAA TCATAAATGG- AAAGTGTTCA ATTAGAA^T 
GAAAACGTGA CTCAATTGGT CTTAGTTACT CTTCGGGACG GAGXGGAAGA 
aAGGGCGAAG ACTCtCGAAC GAGGAAAAGG ATCCCTTCGA AAGAATTGAA 
CGAGGAGCCG TATTAGGTGA AAATCTCATG TACGATTCTG TASAGGGACA 
GSAAGGGTGA rT TATCTGTC rACTTTICCA HTATCAACCC CAAAfiAACCC 
RarTCTGCCT TA^~" 
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